A POWER STRUGGLE: BETWEEN- VS. WITHIN-SUBJECTS DESIGNS IN DEDUCTIVE REASONING RESEARCH

Size: px
Start display at page:

Download "A POWER STRUGGLE: BETWEEN- VS. WITHIN-SUBJECTS DESIGNS IN DEDUCTIVE REASONING RESEARCH"

Transcription

1 Psychologia, 2004, 47, A POWER STRUGGLE: BETWEEN- VS. WITHIN-SUBJECTS DESIGNS IN DEDUCTIVE REASONING RESEARCH Valerie A. THOMPSON 1) and Jamie I. D. CAMPBELL 1) 1) University of Saskatchewan, Canada This experiment examined the relative merits of using within-and between-subjects designs to investigate deductive reasoning. Two issues were investigated: 1) the potential for expectancy and fatigue effects when using within-subjects designs, and 2) the relative power of within- vs between-subjects designs. Participants were presented with problems in a standard belief-bias paradigm in which the believability of putative conclusions varied orthogonally to their validity. The belief bias effect, as well as the effect of validity, and the interaction between beliefs and validity, were not affected by reasoners expectations regarding the number of problems they had to solve. The effect of beliefs and the belief by validity interaction were only marginally affected by the number of problems solved, despite adequate power to observe an effect. Thus, neither expectancy nor fatigue appear to have affected performance, suggesting that there are few drawbacks to using a within-subjects design. In contrast, however, a power analysis clearly established the desirability of using within- relative to between-subjects designs. Withinsubjects designs require far fewer participants to detect effects of comparable size; this was especially true for higher-order (interaction) effects. Finally, we provide a power analysis of within- and between-subjects designs that should be of general utility to researchers planning studies using proportions as a dependent measure. Key words: belief bias, deductive reasoning, statistical power, expectancy effects, carryover effects The default method for investigating deductive reasoning is a within-subjects design in which participants are asked to solve a large number of problems in a single sitting (e.g., Braine, Reiser, & Rumaine, 1984; Cherubini, Garnham, Oakhill, & Morley, 1998; Evans, Barston, & Pollard, 1983; Newstead, Pollard, Evans, & Allen, 1992; Newstead, Handley, & Buck, 1999; Revlin, Leirer, Yopp, & Yopp, 1980; Stanovich & West, 1997; Thompson, 1994, 2000). Given that these problems are typically difficult to solve, and that the problems tend to be similar to each other, expectancy, fatigue, and practice effects may jeopardise the interpretation of the experimental outcomes. One solution is to move to between-subjects designs, in which large numbers of participants are given a small number of problems to solve (Schaeken & Schroyens, 1997). This solution is not without difficulty, however. In addition to the cost entailed in testing large numbers of participants, between-subjects designs tend to be less powerful than their within-subjects The design of this study was based on an Honour s Thesis project carried out by Katie Andrews. This research was funded by a grant from the Natural Sciences and Engineering Research Council of Canada. Correspondence concerning this article should be addressed to Valerie Thompson, Department of Psychology, University of Saskatchewan, 9 Campus Drive, Saskatoon, Canada, S7N 5A5 ( Valerie.Thompson@Usask.Ca). 277

2 278 THOMPSON & CAMPBELL counterparts (e.g., Zimmerman, 1997). Thus, the question becomes whether the benefits of using a within-subjects design (i.e., smaller N, greater power) outweigh the potential drawbacks (i.e., expectancy, fatigue, practice effects). Beliefs and Validity in Deductive Reasoning These issues were investigated in the context of the belief-bias effect, which is one of the most robust phenomena in the deductive reasoning literature. The belief-bias effect refers to the tendency to accept a conclusion that accords with one s belief, regardless of whether that conclusion is in fact valid. This effect has been widely replicated (e.g., Cherubini et al., 1998; Evans et al., 1983; Evans, Newstead, Allen, & Pollard, 1994; Evans & Pollard, 1990; Feather, 1964; Janis & Frick, 1943; Markovits & Bouffard- Bouchard, 1992; Markovits & Nantel, 1989; Morgan & Morton, 1944; Newstead et al., 1992; Oakhill & Johnson-Laird, 1985; Oakhill, Johnson-Laird & Garnham, 1989; Revlin et al., 1980; Stanovich & West, 1997; Thompson, 1996; Torrens, Thompson, & Cramer, 1999). In contrast to the rather pessimistic view of human rationality suggested by the belief-bias effect, however, it has also been widely observed that reasoners accept more valid than invalid conclusions, regardless of believability (Evans et al., 1983; Evans et al., 1994; Newstead et al., 1992; Thompson, 1994, 1996; Torrens et al., 1999). Thus, reasoners competence appears to reflect a mixture of rational and less rational processes (Evans & Over, 1996). If those were the only phenomena observed, the explanation for them would be relatively straightforward, in that one might posit the existence of a set of analytical/ logical processes and a set of non-analytical heuristic processes that contribute additively and independently to performance. However, there is a large interaction between these two factors, such that the effect of validity is larger for unbelievable than believable conclusions (Evans et al., 1983; Evans et al., 1994; Newstead et al., 1992; Thompson, 1996). This finding suggests that the execution of analytical/logical processes is constrained by the believability of the materials, such that logical processing is more likely to take place when considering an unbelievable as opposed to a believable or neutral conclusion (Newstead et al., 1992). Moreover, the ability to account for this pattern of findings, and to predict occasions on which the pattern may change, is commonly used as a litmus test to discriminate among various theories of deductive competence (see for example, Evans et al., 1994; Klauer, Musch, & Naumer, 2000; Newstead et al., 1992). The Relative Merits of Within- and Between- Subjects Designs These effects (belief, logic, and their interaction) are robust, and have been replicated in a variety of studies. As mentioned earlier, the default method is a within-subjects design whereby reasoners are asked to complete a large number of very similar reasoning problems. A quick review of several recent belief-bias studies (Cherubini et al., 1998; Evans et al., 1983; Evans et al., 1994; Evans & Pollard, 1990; Markovits & Bouffard- Bouchard, 1992; Markovits & Nantel, 1989; Newstead et al., 1992; Oakhill & Johnson- Laird, 1985; Oakhill et al., 1989; Revlin et al., 1980; Stanovich & West, 1997; Thompson, 1996; Torrens et al., 1999) indicated that the median number of problems reasoners are

3 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 279 asked to solve is 8 (mean=9) ; typically, reasoners are asked to solve some multiple of four (allowing one problem in each belief by validity cell) or six problems (when a neutral belief condition is included). Expectancy and fatigue effects. As participants are asked to solve a substantial number of difficult, and relatively similar problems, the question arises as to whether reasoners continue to reason throughout the series, or instead adopt a non-logical strategy for reaching conclusions (Schaeken & Schroyens, 1997). For example, knowing that they have a large number of difficult problems to solve, reasoners might be less inclined at the outset to engage in a mentally demanding logical analysis. Furthermore, as the series of problems progresses, reasoners might be less and less inclined towards such an analysis. Expectancy and fatigue effects, therefore, might operate to reduce the effect of validity on reasoning, and highlight the contribution of non-logical processes, such as beliefs. The net outcome would be that the effect of validity would be smaller (and the belief-bias effect correspondingly larger) when reasoners expected many rather than few problems; similarly, such effects should also be more pronounced at the end of a long sequence of problems relative to the beginning. Moreover, if probability of engaging in a logical analysis were reduced by expectancy or fatigue, so too should the size of the belief by validity interaction, which presumably arises from this logical analysis. In sum, the use of within-subjects designs allows the possibility that the effects observed may tell us as much about a task-specific strategy as they do about the constructs under investigation (see also Roberts, 1993; Thompson, 2000). In terms of the effects described here, the literature tells us that this is a plausible hypothesis: The effects of validity and beliefs can be manipulated (albeit with difficulty) by instructions that place relatively more or less emphasis on logical processes (Evans et al., 1994); the effects of belief and the search for counter-examples may be a matter of individual differences (Markovits, 1984; Torrens et al., 1999; Fugelsang & Thompson, 2000); and deductive reasoning in general is known to reflect task-specific as well as domain-independent processes (Thompson, 2000). Power of within- and between-subjects designs. If expectancy and fatigue effects were significant factors in participants reasoning, the logical alternative would be to move to a between-subjects design, wherein large numbers of participants are tested with relatively small numbers of problems. This is not done without cost, however. Despite the loss of degrees of freedom that occurs when data are analysed using within-subjects compared to between-subjects techniques, there is a significant loss of power associated with the use of between- relative to within- subjects designs, even when the correlation between paired observations is as small as.05 (Zimmerman, 1997). Put in this context, the problem then becomes whether the potential drawbacks associated with using withinsubjects designs are enough to outweigh the loss of power entailed by a between-subjects analysis. Rationale for the Current Experiment The goal of this experiment was to investigate these issues in the context of the belief-bias effect. Participants were given multiple-model problems, similar to those used

4 280 THOMPSON & CAMPBELL by Evans et al. (1994) and Newstead et al. (1992), and were asked to indicate whether or not a provided conclusion followed logically from the premises. These problems are difficult and demanding, and are therefore appropriate for examining the potential problems of fatigue and expectancy. For example, a recent study using identical materials (Thompson, Striemer, Reikoff, Gunter, & Campbell, 2003) indicated that participants made errors on 35% of the problems (i.e., by accepting invalid or rejecting valid conclusions), and required an average of over 20 sec per problem to solve them. In the current study, participants were provided pairs of premises followed by conclusions to evaluate; the conclusions were either believable or unbelievable, and either followed validly from the premises or not. An example of a valid, believable syllogism is presented below: Some well-educated people are Pennes No judges are Pennes Therefore, some well-educated people are not judges To investigate the effects of expectancy, one group of reasoners, the Solve1 group, solved only a single problem; a second group, the Solve8 group, solved (and were told they were going to solve) eight problems. Eight was chosen, because as described above, this represents the typical number of problems that reasoners are asked to solve in experiments of this type. The first problem solved by the two groups was compared. If reasoners are less likely to engage in a logical analysis when they believe they must solve a large number of problems, then the effect of validity (i.e., the difference between valid and invalid conclusions accepted) should be smaller for the Solve8 than the Solve1 group. The interaction between belief and validity, which is also presumed to depend on logical analysis, should also be smaller for the Solve8 than the Solve1 group. Finally, if people who are faced with doing large numbers of problems opt for a non-logical strategy, such as basing their conclusions on their beliefs, then the effect of belief (i.e., the difference between believable and unbelievable conclusions accepted) should be larger in the Solve8 than the Solve1 condition. Fatigue effects were examined in the context of the Solve8 group. This group solved two sets of problems, each containing one problem in each Belief x Validity cell. Again, to the extent that fatigue affects responses, the effect of belief should be larger and the effect of validity and the belief by validity interaction smaller, for the second four than the first four problems. In addition, the design of our study offered the opportunity to make a direct comparison of the power of comparable within- and between- subjects versions of the same variables. The comparison between the Solve1 and Solve8 groups is a completely between-subjects experiment: As only one problem from each participant was analysed, the effects of belief, validity, their interaction, and the three-way interaction between belief, validity, and expectations were all analysed between-subjects. In contrast, the data from the Solve8 group provide an analogous within-subjects model: Here belief, validity, and the order in which problems are solved were all analysed within-subjects. Thus, it was possible to make a direct comparison of the power afforded by within-

5 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 281 and between-subjects versions of the experiment to detect effects of various sizes. Although it is well known that within-subjects designs are more powerful than their between-subjects counterparts, it was none-the-less anticipated that the current analysis would provide concrete information regarding the relative tradeoffs in the context of deductive reasoning studies. To this end, we have provided a number of relevant calculations (such as the N required to detect effects of various sizes) that should be of general utility to researchers planning experiments in which the dependent variable is proportion of conclusions accepted. Finally, we discuss the relevant statistical and methodological issues involved in planning and analysing these types of studies, and provide a concrete illustration of how power analyses can supplement statistical testing approaches. METHOD Participants: 774 students from introductory psychology courses at the University of Saskatchewan participated in partial fulfilment of a course requirement. Participants were tested during regular class time; assignment to the Solve1 (N=382) and Solve8 (N=392) groups was determined randomly. 64% were female, and the mean age was (SD=3.28). Six percent indicated that they had previously taken a course in logic; because the response patterns of these people were practically identical to those who had not taken a course in logic, their data was retained in order to maximize the power of the analysis. Materials: The materials consisted of three-term syllogisms. The logical structure of these syllogisms was identical to those used by Evans et al. (1994; Experiment 1), and is illustrated in Table 1. All of the problems combined one premise of the type No are with one premise of the type Some are. These premise pairs were accompanied by a conclusion of the form Some are not. The problems were divided into two groups, one group leading to conclusions of the form Some C are not A and the other leading to conclusions of the form Some A are not C. Valid and invalid versions of each problem were created, as illustrated in Table 1. The problems presented to participants had the structure illustrated in Table 1, but with words substituted for the A, B, and C terms. The A and C terms of each problem referred to familiar categories, such as well-educated people and judges, and the B term was a nonsense term, such as Pennes (as per Newstead et al., 1992, Experiment 1; Markovits & Nantel, 1989). Eight different A, B, and C terms were chosen for this study (see Appendix A for a complete list); each set of A, B, and C terms could be accompanied by both a believable or unbelievable conclusion, yielding 16 unique problems (see Appendix A). Evans et al. (1983) established the believability of these conclusions by asking participants to rate their believability on a scale of 1 7; believable conclusions had a mean rating of 6.38 (range=5.75 to 6.94), whereas the unbelievable conclusions had a mean rating of 2.98 (range=1.69 to 3.81). Design and Procedure for the Solve8 Group: Participants in the Solve8 eight group received two blocks of 4 problems. One block of problems led to the Some C are not A conclusions, and the other block led to the Some A are not C conclusions (see Table 1). Across participants, the Some C are not A and Some A are not C problems appeared equally often in the first and second block of problems. Each block of 4 problems consisted of one problem in each belief by validity cell; the order of these conditions was determined by a Latin Square, which was repeated every four participants. Content was assigned to problems so that no problem content was repeated during the eight problems, and each problem content was assigned equally often to each belief by validity condition, and equally often to Some C are not A and Some A are not C problems.

6 282 THOMPSON & CAMPBELL Table 1. Logical Forms of the Problems Used in the Experiment Group 1 Valid No A are B Some C are B Therefore, some C are not A Invalid Some A are B No C are B Therefore, some C are not A Group 2 Valid Some A are B No C are B Therefore, some A are not C Invalid No A are B Some C are B Therefore, some A are not C The counter-balancing was accomplished using a computer program, which also generated the test booklets. Two problems were presented on each page; following each problem were two options, yes and no. Participants were instructed to circle the appropriate option for each problem, as described below (if answering logically, they would circle yes for the valid and no for the invalid problems). Instructions were written, and stapled to the front of the booklet. The instructions were adapted from Evans et al. (1994): This experiment is designed to examine how people solve logical problems. On the next page are eight logical reasoning problems. Your task is to decide whether the conclusion given below the problem follows logically from the information given in that problem. You must assume that all the information which you are given is true; this is very important. If, and only if, you judge that a given conclusion logically follows from the information given you should circle YES below the conclusion on that page. If you think that the conclusion given does not necessarily follow from the information given, you should circle NO. Please take your time and be certain that you have the logically correct answer before making it. If you have any questions, please ask now. You must not make notes or diagrams of any kind to aid you in this task. Thank you very much for participating. For our records, please provide the following demographic data: Age: Gender: Any prior training in logic (e.g., Phil 105.3, Introduction to Logic)? YES/NO Design and Procedure for the Solve1 Group: These participants solved only one problem. This problem was the same as the first problem solved by people in the Solve8 group (i.e., the problem for each participant in the Solve1 group was matched to the first problem of participants in the Solve8 group). Because of the way that the conditions were counterbalanced, the problems solved by the Solve1 group were equally likely to fall into each belief by validity cell, the problem contents were distributed equally among these problems, and equal numbers of problems had Some C are not A and Some A are not C conclusions. Because the materials were distributed in large classes, it was not possible to ensure that all of the cells were filled with exactly the same numbers of participants, but the actual distribution was very even, ranging between 91 and 99 participants per cell (mean N per cell was 97). The instructions given to this group were the same as those given to the Solve8 group, save that the word one was substituted for the word eight in the first paragraph.

7 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 283 RESULTS Scoring and Analysis Strategy A conclusion was deemed accepted if the participant circled yes. The number of conclusions accepted was analysed by Analysis of Variance. The ANOVA was chosen because of its ubiquity as an analysis tool, because it allowed straightforward tests of the effects of interest, and because it offers a comparable statistical model for both betweenand within-subjects analyses. Moreover, although the data were binomial in nature, as outlined below, the data obtained in the current experiment satisfied the major assumptions of the analysis. In cases where the data are not suitable for the ANOVA, alternative analysis techniques are available in the form of log-linear analyses for between-subjects designs, and sign-tests for within-subjects designs. For most purposes, binomial data satisfy the two main assumptions of the ANOVA, namely normality and homogeneity of variance (Edwards, 1985; McClave & Deitrich, 1994); moreover, the risks of Type I and Type II errors are not more pronounced for binomial than continuous data (Edwards, 1985; see also Holden & Overall, 1987). Specifically, with regard to the assumption of normality, the sampling distribution of the binomial approximates the normal distribution for moderate values of p and q: The larger the sample size, the more p and q can deviate from.5 and still approximate the normal curve. McClave and Dietrich (1994) suggest that if the values of p, q, and n satisfy the following conditions, the distribution will be approximately normal: 0<p±3 pq n <1 This condition is satisfied for all of the analyses reported below. A second assumption concerns homogeneity of variance, an assumption that is of particular importance to the binomial in that variances and means are correlated (i.e., when means differ, so do variances). However, the ANOVA is robust with respect to violations of this assumption, provided that the treatment means are of equal size; specifically, so long as the larger variance does not exceed the smaller by a ratio of more than 4, the assumptions of the analysis are met (Edwards, 1985; Hays, 1994; Howell, 1992). Again, this assumption is met for all of the analyses reported below. Another assumption that applies exclusively to within-subjects analysis is the assumption of sphericity in the variance-covariance matrix. As the current analysis used only factors of two levels, sphericity was not a concern. In cases where researchers have more than two levels, many analysis programs will provide tests of the sphericity assumption, as well as appropriate corrections for cases where the assumption is not met. The final omnibus issue to be discussed concerns power. Following Cohen (1992-a, 1992-b), a benchmark of.80 will be adopted for power analyses. Cohen (1992-a, 1992-b) suggests that this level of power represents a reasonable compromise between the ideal of perfect power (i.e., a probability equal to 1 of detecting experimental effects of interest) and the cost associated with testing large numbers of participants. Power curves generally have an ogival shape, and as they approach their asymptote, increasingly large numbers of

8 284 Table 2. THOMPSON & CAMPBELL Mean Proportion of Conclusions Accepted as a Function of Number Solved, Conclusion Believability, and Validity Number Solved Validity Conclusion Believability Believable Unbelievable Mean One Valid Invalid Mean Eight Valid Invalid Mean participants are required to achieve increasingly small gains in power. For the purposes of this paper, therefore, if an analysis has 80% power to detect an effect at =.05, that analysis will be deemed to have sufficient power. The power calculations reported in this paper are exact, and were computed using MorePower (Campbell & Thompson, 2002). Within vs Between Subjects Designs: Effects of Expectancy and Fatigue Expectancy effects: Comparison of the Solve1 and Solve8 groups. If reasoners performance is influenced by their expectations, one should observe a difference in performance between the Solve8 group, who expected to solve eight problems, and the Solve1 group, who expected to solve only one. For this analysis, the responses given by the Solve1 group were compared to those given to the first problem solved by the Solve8 group. The effects of expectancy should emerge as an interaction between number solved (1 vs 8) and either validity or belief. The mean proportion of conclusions accepted in each condition is presented in Table 2. The data were analysed using a 2 (Number Solved: 1 vs 8) by 2 (Validity: valid vs invalid) by 2 (Belief: believable vs unbelievable) between-subjects ANOVA (MSE for all tests was.209, df=1, 766). The analysis replicated the standard findings: More believable than unbelievable conclusions were accepted (.70 vs.49; F=39.3, p<.001); more valid than invalid conclusions were accepted (.73 vs.46; F=69.7, p<.001); and there was a significant interaction between belief and validity (F=15.56, p<.001) such that the effect of belief was larger for invalid (.33) than valid (.08) conclusions. More importantly, none of the main effects or interactions involving number solved were significant (F 1.2), suggesting that performance on the first problem was not affected by reasoners expectations regarding the number of problems they would be required to solve. In other words, the effects of belief, validity, and the interaction between beliefs and validity were the same for people who expected to solve a large number of problems relative to those who expected to solve a small number of problems. This analysis suggests that there is little cause for concern that reasoners expectations influence the strategies they adopt when solving syllogisms of this sort.

9 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 285 Table 3. Mean Proportion of Conclusions Accepted as a Function of Block, Conclusion Believability, and Validity Block Validity Conclusion Believability Believable Unbelievable Mean First Valid Invalid Mean Second Valid Invalid Mean Note, however, that despite the large N (almost 800 participants), the fact that a between-subjects analysis was used meant that the probability of detecting the relevant effects was quite small, and that the foregoing conclusion must therefore be qualified by a lack of power. In the case of main effects, the analysis had reasonable power to detect only moderate effects: there was 80% power to detect a difference equal to.09 in the proportion of conclusions accepted in the Solve1 and Solve8 groups. For the interaction effects, only relatively large effects could be detected with reasonable probability: In order to detect an interaction between number solved and either validity or belief, the magnitude of the validity/belief effect would have to have been larger by.18 in one group relative to the other. For example, the difference between believable (M=.705) and unbelievable (M=.47) conclusions for the Solve1 group was about.24; this difference would need to be.42 in the Solve8 group before this experiment had 80% power to pick up the interaction statistically. In the case of the three-way interaction, the situation is even more grim: there was 80% power to detect only a very large change (i.e.,.37) in the belief by validity interaction. In order for this to have happened, the interaction between belief and validity would need to have changed by more than a factor of 2. That is, for the Solve1 group, the difference between believable and unbelievable conclusions was larger by.33 for invalid relative to valid conclusions; to change by.37 this difference would have had to have more than doubled in size to.70 for the Solve8 group, or to have been reduced to less than 0 (i.e., such that the effect was beliefs was smaller by.04 for invalid relative to valid conclusions). Thus, even with close to 100 participants per cell, all but very large interaction effects would likely go undetected. Fatigue effects: Analysis of the Solve8 group. This analysis addressed the question of whether or not fatigue effects played a significant role in reasoners performance. For this analysis, only the Solve8 group was analysed; performance on the first block of problems was compared to performance on the second block of problems. Fatigue effects should show up as differences in the effects of belief or validity on the two blocks of problems. The data are presented in Table 3; they were analysed using a 2 2 2

10 286 THOMPSON & CAMPBELL (Block Belief Validity) within-subjects ANOVA. Note that data were deleted list-wise for this analysis, so that if an observation was missing in one cell, that person s data was excluded from all cells. The data from five participants were deleted for this reason, leaving an N of 387 for the analysis (df=1, 386 for all tests). As expected, there was a large effect of belief (.76 vs.45, F=304, MSE=.23, p<.001), validity (.75 vs.46, F=277, MSE=.25, p<.001), and an interaction between belief and validity (F=36.4, MSE=.16, p<.001) such that the effect of belief was larger for invalid (.39) than valid (.21) conclusions. In addition, there was a small, but significant effect of block, such that more conclusions were accepted for the second than the first block of problems (.62 vs.59, F=6.95, MSE=.14, p=.009). The interaction between block and belief approached significance (F=3.60, MSE=.17, p=.06), revealing a tendency for the belief effect to be larger on the second (.33) than the first (.28) block of problems. The interaction between block and validity was not significant (F<1). Unlike the betweensubjects analysis, however, the lack of significance cannot be attributed to lack of power. This analysis had 80% power to detect a change of.08 in the effect of either belief or validity across blocks. It is therefore possible to conclude that whatever changes are produced in the effect of validity or belief effects as a result of fatigue, they are likely to be smaller than.08. The three-way interaction also approached significance (F=3.15, MSE=.16, p=.08): The two way interaction between validity and belief was slightly larger on the first (.23) than the second (.12) block of problems, although it was significant for both blocks (F(1,391)=27.2, MSE=.18, p<.001 and F(1,386)=9.96, MSE=.14, p=.002 for the first and second blocks respectively). The three-way interaction was then analysed using pair-wise comparisons, which indicated that differences between the first and second block were observed only for the valid believable problems: more valid believable conclusions were accepted in the second block than the first (t(402)=3.07, p=.002). No other comparisons reached significance (t 1.32, p.19). In other words the interaction between belief and block, as well as the three-way interaction between belief, validity, and block can be attributed to just one cell; there was 80% power to detect a difference between blocks equal to or less than.08. Summary. The preceding analyses indicated that there is relatively little cause for concern in using a within-subjects design. There was no detectable effect of expectancy, in that performance was similar regardless of whether reasoners expected to complete a large number of problems or a single problem. Performance did change slightly as a function of the number of problems solved; however this change was relatively small, and restricted to a single cell of the experiment (i.e., valid believable problems). All told, the evidence suggests that expectancy or fatigue effects are unlikely to jeopardise the interpretation of within-subjects designs, at least for the number of problems used in this experiment. 1 Within vs Between Subjects Designs: Power Analysis Thus, there seem to be few drawbacks to using a within-subjects design, and the advantages in terms of power are enormous. A complete power analysis is presented

11 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 287 below, but the general point can be illustrated using examples from the preceding analyses. The analysis of number solved, a between-subjects factor, had twice the N of the analysis of the within-subjects Block effect, yet it had substantially less power to detect both main effects and interaction effects. The within-subjects analysis detected a difference of.03 in the number of conclusions accepted in the first and second block; the between-subjects analysis had 80% power to detect a corresponding difference of.09 between the Solve1 and Solve8 groups. Likewise for the interaction effects: The withinsubjects analysis had 80% power to detect a change of.08 in the effect of validity or belief across blocks, whereas the between-subjects analysis had comparable power to detect a change of.18 between the Solve1 and Solve8 groups; the within-subjects analysis had 80% power to detect a three-way interaction equal to.16 whereas for the between-subjects analysis, this number was.37. Thus, even with only half the number of participants, the within-subjects design was much more powerful than its between-subjects counterpart. Error variance in within- vs between-subjects designs. In a within-subjects design, each subject serves as his/her own control, thereby reducing the error variance (i.e., MSE) so that equivalently sized treatment effects (i.e., as computed for MST in the ANOVA) are more likely to produce a significant test for within- relative to between-subjects designs. The size of the error variance is reduced in proportion to the correlation between conditions (e.g., between the rate of acceptance of believable and unbelievable conclusions): the higher the correlation, the greater the reduction. In general, the variance of the difference between two means can be computed from the following formula: S 2 x1 x2 =S2 x1 +S2 x2 2rs x1 s x2 When r=0, as is the case when one has independent observations in a between-subjects design, the variance of the difference between means will just be the sum of the two variances. For example, in a between-subjects ANOVA, MSE is computed by summing the variances of the individual cells, weighting them by the number of observations per cell. In contrast, when r>0, as for within-subjects designs, the variance is reduced as a function of the size of the correlation. Because the error variance is smaller, the value of F or t will be larger for within relative to between-subjects designs, resulting in more power, even when r is small (Zimmerman, 1997). 2 Estimating power for within- and between-subjects analyses. The current data can be used to illustrate this point. For these data, the correlation between conditions is small: for main effects, the average correlation between conditions is.15, for the 2-way interactions, the average is.10, and for the 3-way interaction, the average correlation 1 To verify that our observations could be extended to a larger number of problems, we have collected data from 226 participants who solved 16 problems. These data afforded conclusions consistent with those reported here: There was a 3-way interaction between block, validity, and belief that again could be attributed to a single cell: Participants were more likely to accept invalid, unbelievable conclusions in the second than the first block; otherwise, the proportion of conclusions accepted in the first and second blocks were statistically identical (t 1.60, p.11, with 80% power to detect a difference equal to or less than.08 between the blocks). 2 When r is 0 or negative, the researcher is better off analysing the data using a between-subjects model.

12 288 THOMPSON & CAMPBELL Fig. 1. Power curves for main effects, two by two interactions, and two by two by two interactions for within- and between-subjects analyses.

13 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 289 among cells is.06. Even so, the within-subjects analysis was enormously more powerful than its between-subjects equivalent, as demonstrated in Fig This figure plots the power available to detect effects of various magnitudes, given sample sizes ranging from 16 to 296 participants. These numbers were calculated using the MSE from the ANOVA s reported in the previous sections to estimate error variance. 4 Effect magnitudes are computed as follows: For main effects, effect magnitude is the size of the difference between two proportions (e.g., the proportion of believable minus the proportion of unbelievable conclusions accepted); for interactions, effect magnitude is the difference of differences between four proportions; and the magnitude of the three way interaction is the difference of difference of differences. For example, for the data reported in Table 3 (see footnote 3), the magnitude of the validity effect is.29 (.75.46), the size of the Belief Validity interaction is.18 ([.65.26] [.86.65]), and the magnitude of the three-way interaction is.11 (size of the Belief Validity interaction for the first block [.23] minus the second block [.12]). Reporting effect magnitudes in this manner allows the interested reader to use the data in Fig. 1 in order to estimate power for his/her own experiments in conceptually meaningful units. The figure affords two simple conclusions. First, the power of a within-subjects design is much greater than the between-subjects equivalent for all types of effects. A relatively small sample (N=24) has adequate power (i.e.,.84) to detect a main effect equal to.20 in a within-subjects design; the equivalent between-subjects design has power equal to about.16. Similar statements apply to the interaction effects: A moderately sized sample (N=48) has adequate (i.e.,.85) power to detect a 2 2 interaction equal to.25 in a within-subjects design; the comparable between-subjects design has power equal to.14. Second, the power of a given design to detect an effect varies according to the order of effect. Both the within- and between-subjects designs have more power to detect main effects than 2 2 interactions of comparable magnitude, and more power to detect a 2 2 interaction than a 3-way interaction of comparable magnitude. For example, for a withinsubjects design, one has power of about.8 to detect a main effect equal to.25 with 16 participants; that number of participants has power equal to.36 to detect a 2 2 interaction of that magnitude, and.16 to detect a 3-way interaction. Indeed, even with 296 participants, the between-subjects design lacks sufficient power to detect even very large three-way interactions. Estimating sample sizes for within- and between- subjects designs. A similar conclusion can be drawn from the data presented in Table 4, which tabulates the samples sizes needed to detect effects of various sizes. These data emphasise the point that the within-subjects design has many times more power than its between-subjects counterpart to detect comparable effects. In all cases, the between-subjects version required many 3 For more precise calculations, the reader is invited to use the MorePower calculator, which is available as freeware: To estimate error variance, one can use the MSE terms reported in the results section of this paper, or derive estimates for the variance of the difference between groups (s d2 )from existing data using the formula provided in the section titled Error variance in within- vs between-subjects designs. 4 For the between-subjects design, df were computed as for a ANOVA.

14 290 THOMPSON & CAMPBELL Table 4. N Needed to Detect Effects of Various Sizes With Power=.8 Effect Type Size Within Design Between* Main Effects , Interaction.05 1,004 10, , , Interaction.05 4,015 41, ,004 10, , , , , Note: * N s given are for the entire experiment. To compute the number needed per treatment condition, divide the total N by two for the main effects, by four for the 2 2 interaction, and by eight for the interaction. df are computed as for a design.

15 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 291 more participants to detect comparable effects than the within-subjects counterpart; in some cases, up to 10 times as many. The generality of this analysis. Although the power analyses described in the preceding sections were based on error estimates taken from a specific experiment, they can also serve as a model for researchers designing experiments of this type. Because of the large sample size, the error variances should provide reasonably accurate parameter estimates. Moreover, because the variances used to provide the power estimates approached their maximal value (i.e., the variances in individual cells are close to.25, and the correlations between cells in the within-subjects analysis are small), the analyses should provide researchers with a conservative estimate for most types of experiments using binomial proportions as a dependent measure. Thus, researchers can use Fig. 1 to determine the power of their experiments to detect effects of various sizes, and can use Table 4 when planning experiments to estimate the number of participants needed in order to be able to detect the effects they are interested in. Alternative analysis techniques. Even though the ANOVA provides a satisfactory statistical model for most binomial data, most researchers opt for non-parametric tests. In this section, the power of the ANOVA is contrasted with the power of the most commonly used alternative techniques. In the case of between-subjects analysis, the most appropriate alternative test is the 2 test. In terms of power, this test will be roughly equivalent to the ANOVA. The reason for this is that the observed value of 2 for a 2 2 contingency table is equal to z 2 (Howell, 1992). Thus, because t 2 =F and t 2 z 2 z 2 = 2 it follows that F 2 Consequently, the test of a main effect having two levels is roughly the equivalent of a 2 test on a 2 2 contingency table. For example, if one does a chi-square analysis on the effect of expectancy (Solve1 vs Solve8) using the data reported in Table 2, the computed 2 is.136; this is the same value that would be obtained for F in a single-factor ANOVA with conclusions accepted as the dependent variable and number solved as the independent variable; the chi-square values for belief (33.8) and validity (61.5) also closely approximate the computed value of F (35.3 and 66.6 respectively). Unlike the chisquare test, however, the ANOVA has the flexibility to examine interaction effects; in cases where interaction effects are of interest, log-linear analyses should be preferred (Agresti, 1996). For within-subjects designs, the most commonly used alternative to the ANOVA is the sign test. This test can easily be adapted to examine both main effects and interaction effects. However, because this test ignores the magnitude of effects in favour of examining the sign of the differences between conditions, it is generally less powerful than parametric tests (Howell, 1992). The corollary to this, however, is that with large

16 292 THOMPSON & CAMPBELL numbers of subjects, the ANOVA may be spuriously powerful, and detect statistically significant, but trivial differences. The effect of order in the current experiment is a case in point: the difference between the proportion of conclusions accepted in the first and second blocks was tiny (i.e., 3%). This difference was significant despite the fact that a majority of participants (.63) in this study did not show this effect. GENERAL DISCUSSION These findings have several clear implications for research in this field. The default model of experimental design is within-subjects, using about eight problems in total. These problems are difficult and time consuming, raising the potential problem that extraneous variables such as expectancy and fatigue may be mediating performance. Specifically, we proposed that participants might be more inclined to use a less demanding, heuristic strategy, when they thought they were going to solve eight problems rather than one, or that the effort devoted to logical analysis would decline over a series of trials. These factors would reduce the effects of logic, and exacerbate the effects of belief. However, based on our data, there was no evidence to suggest that these problems are of concern: Reasoners expectations did not affect performance, and with one exception (in which accuracy increased over blocks), performance did not differ between the first and second block of trials. Of course, an alternative interpretation of these findings is that reasoners do not use a demanding, logic-based strategy to start with (e.g., Gillhooly, Logie, Wetherick, & Wynn, 1993; Gillhooly, Logie, & Wynn, 1999; Oaksford, Roberts, & Chater, 2002; Quayle & Ball, 2000), so that effects of fatigue would not be noticed. Indeed, the available evidence suggests that many reasoners rely on non-logical strategies to solve syllogisms. Nonetheless, the heuristics that they use still require working memory resources (Gilhooly et al., 1993), such that secondary task demands cause participants to shift from relatively sophisticated to relatively more primitive strategies (Gilhooly et al., 1999). Thus, even under the assumption that reasoners use non-logical strategies as a default, one would still expect fatigue to result in a shift of strategies over trials. In sum, although fatigue and practice will presumably become problematic at some point, within the range of problems typically used in these types of studies, there seems little cause for concern (see also footnote 2). Moreover, because the power of the withinsubjects design is so much greater than its between-subjects counterpart, it should be preferred as the default paradigm. Unless the researcher has good reason to believe that carry-over or practice effects are likely to be an issue, s/he is advised to opt for a withinsubjects design. This is true, even when one expects performance in the various conditions of the experiment to be only slightly correlated. In the current study, the correlations between conditions were small (in some cases less than.10), and yet the within-subjects design was several times more powerful than its between-subjects counterpart. In addition, because the advantage of the within- relative to the betweensubjects design will increase as a function of the correlation among conditions, the power

17 BETWEEN- VS. WITHIN-SUBJECTS DESIGNS 293 advantage for within-subjects designs observed in the current experiment can be thought of as the lower bound; researchers working with designs where performance is more highly correlated across conditions will obtain even more benefit from the within-subjects design. There are two potential concerns about using within-subjects designs that may make them less attractive. One regards the uncertainty surrounding appropriate error terms for post-hoc analyses. This is problematic when assumptions of homogeneity of variance and sphericity have not been satisfied. As per earlier discussion, concerns about homogeneity of variance are most likely to arise when the value of p and q (percentage of conclusions accepted and rejected) deviate greatly from.5, and when sample size is small. In cases where the assumptions have not been met, then using independent variance estimates may be the most appropriate (Masson & Loftus, 2003). Second, because within-subjects designs use fewer degrees of freedom than their between-subjects counterparts, they are potentially less powerful, especially for small samples where even small losses of in df can results in quire large increases in the F or t- value needed for a significant test. However, as mentioned earlier, even when the correlation between cells is small, the gain in power that comes from the within-subjects analysis outweighs the potential loss in degrees of freedom (Zimmerman, 1997). Fig. 1 also illustrates this point. A small sample (N=16) has about 80% power to detect a main effect equal to.25 in a within-subjects analysis; a comparably sized sample in the between-subjects case has power less than 20%. Even using 16 participants per cell (N=32), the between-subjects analysis has only 30% power to find an equivalent effect. Clearly, however, there are occasions when carry-over effects render the use of within-subjects designs is inappropriate, e.g., when one suspects that insight or learning may take place and alter responses on subsequent trials, or when early trials induce a response bias that influences performance on later trials. The effects of instructions or training on performance also must be examined using a between-subjects design. For example, a question whose answer has relevance for modern theories of reasoning is whether or not the belief-bias effect can be ameliorated by providing reasoners with explicit instructions (Evans et al., 1994; Newstead et al., 1992). The evidence on this point is mixed, with some studies suggesting that it is possible to reduce belief-bias and others showing that it is not. The problem may be one of power. To find a reduction in the belief-bias effect requires testing for an interaction between belief and another variable (in this case instruction condition). The largest sample size used in these studies was 72; even using a within-subjects design, over 100 participants would be required to observe a reduction from its usual value (about.30) to half its normal size (about.15) with power=.8. Using a mixed design, with instructions as a between-subjects variable and believability as a within-subjects variable, one would need 162 participants to observe a comparable effect. To observe an effect equal to.10 (i.e., a reduction from.30 to.20), one would need over 350 participants. Even more sobering are experiments that hinge on the presence or absence of the belief by validity interaction. In two influential papers, researchers have made persuasive

Syllogistic reasoning time: Disconfirmation disconfirmed

Syllogistic reasoning time: Disconfirmation disconfirmed Psychonomic Bulletin & Review 2003, 10 (1), 184-189 Syllogistic reasoning time: Disconfirmation disconfirmed VALERIE A. THOMPSON, CHRISTOPHER L. STRIEMER, RHETT REIKOFF, RAYMOND W. GUNTER, and JAMIE I.

More information

Individual Differences and the Belief Bias Effect: Mental Models, Logical Necessity, and Abstract Reasoning

Individual Differences and the Belief Bias Effect: Mental Models, Logical Necessity, and Abstract Reasoning THINKING AND REASONING, 1999, THE 5 (1), BELIEF 1 28 BIAS EFFECT 1 Individual Differences and the Belief Bias Effect: Mental Models, Logical Necessity, and Abstract Reasoning Donna Torrens and Valerie

More information

This is an Author's Original Manuscript of an article submitted to Behavior & Brain Science and may differ from the final version which is available here: http://journals.cambridge.org/action/displayabstract?frompage=online&aid=8242505

More information

Negations in syllogistic reasoning: Evidence for a heuristic analytic conflict

Negations in syllogistic reasoning: Evidence for a heuristic analytic conflict Negations in syllogistic reasoning: Evidence for a heuristic analytic conflict Item type Article Authors Stupple, Edward J. N.; Waterhouse, Eleanor F. Citation Stupple, Edward J. N., Waterhouse, Eleanor

More information

Thompson, Valerie A, Ackerman, Rakefet, Sidi, Yael, Ball, Linden, Pennycook, Gordon and Prowse Turner, Jamie A

Thompson, Valerie A, Ackerman, Rakefet, Sidi, Yael, Ball, Linden, Pennycook, Gordon and Prowse Turner, Jamie A Article The role of answer fluency and perceptual fluency in the monitoring and control of reasoning: Reply to Alter, Oppenheimer, and Epley Thompson, Valerie A, Ackerman, Rakefet, Sidi, Yael, Ball, Linden,

More information

Not All Syllogisms Are Created Equal: Varying Premise Believability Reveals Differences. Between Conditional and Categorical Syllogisms

Not All Syllogisms Are Created Equal: Varying Premise Believability Reveals Differences. Between Conditional and Categorical Syllogisms Not All Syllogisms Are Created Equal: Varying Premise Believability Reveals Differences Between Conditional and Categorical Syllogisms by Stephanie Solcz A thesis presented to the University of Waterloo

More information

Necessity, possibility and belief: A study of syllogistic reasoning

Necessity, possibility and belief: A study of syllogistic reasoning THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2001, 54A (3), 935 958 Necessity, possibility and belief: A study of syllogistic reasoning Jonathan St. B.T. Evans, Simon J. Handley, and Catherine N.J.

More information

At least one problem with some formal reasoning paradigms

At least one problem with some formal reasoning paradigms Memory & Cognition 2008, 36 (1), 217-229 doi: 10.3758/MC.36.1.217 At least one problem with some formal reasoning paradigms JAMES R. SCHMIDT University of Waterloo, Waterloo, Ontario, Canada AND VALERIE

More information

Valerie Thompson a & Jonathan St. B. T. Evans a a Department of Psychology, University of

Valerie Thompson a & Jonathan St. B. T. Evans a a Department of Psychology, University of This article was downloaded by: [University of Saskatchewan Library] On: 24 August 2012, At: 16:08 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

An Inspection-Time Analysis of Figural Effects and Processing Direction in Syllogistic Reasoning

An Inspection-Time Analysis of Figural Effects and Processing Direction in Syllogistic Reasoning An Inspection-Time Analysis of Figural Effects and Processing Direction in Syllogistic Reasoning Edward J. N. Stupple (E.J.N.Stupple@derby.ac.uk) Department of Psychology, University of Derby Derby, DE3

More information

The Influence of Activation Level on Belief Bias in Relational Reasoning

The Influence of Activation Level on Belief Bias in Relational Reasoning Cognitive Science (2012) 1 34 Copyright 2012 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12017 The Influence of Activation Level on Belief

More information

The Belief Bias Effect Is Aptly Named: A Reply to Klauer and Kellen (2011)

The Belief Bias Effect Is Aptly Named: A Reply to Klauer and Kellen (2011) Psychological Review 2011 American Psychological Association 2011, Vol. 118, No. 1, 155 163 0033-295X/11/$12.00 DOI: 10.1037/a0021774 COMMENT The Belief Bias Effect Is Aptly Named: A Reply to Klauer and

More information

Assessing the Belief Bias Effect With ROCs: It s a Response Bias Effect

Assessing the Belief Bias Effect With ROCs: It s a Response Bias Effect Psychological Review 010 American Psychological Association 010, Vol. 117, No. 3, 831 863 0033-95X/10/$1.00 DOI: 10.1037/a0019634 Assessing the Belief Bias Effect With ROCs: It s a Response Bias Effect

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

J. St.B.T. Evans a, S. E. Newstead a, J. L. Allen b & P. Pollard c a Department of Psychology, University of Plymouth, Plymouth, UK

J. St.B.T. Evans a, S. E. Newstead a, J. L. Allen b & P. Pollard c a Department of Psychology, University of Plymouth, Plymouth, UK This article was downloaded by: [New York University] On: 27 April 2015, At: 14:56 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Interactions between inferential strategies and belief bias

Interactions between inferential strategies and belief bias Mem Cogn (2017) 45:1182 1192 DOI 10.3758/s13421-017-0723-2 Interactions between inferential strategies and belief bias Henry Markovits 1 & Janie Brisson 1 & Pier-Luc de Chantal 1 & Valerie A. Thompson

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Content and Strategy in Syllogistic Reasoning

Content and Strategy in Syllogistic Reasoning Content and Strategy in Syllogistic Reasoning Hipólito Marrero and Elena Gámez Departamento de Psicología Cognitiva, Social y Organizacional Universidad de La Laguna Abstract Syllogistic reasoning has

More information

Analysis of Variance: repeated measures

Analysis of Variance: repeated measures Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric

More information

Single-Factor Experimental Designs. Chapter 8

Single-Factor Experimental Designs. Chapter 8 Single-Factor Experimental Designs Chapter 8 Experimental Control manipulation of one or more IVs measured DV(s) everything else held constant Causality and Confounds What are the three criteria that must

More information

Dual-Process Theory and Syllogistic Reasoning: A Signal Detection Analysis

Dual-Process Theory and Syllogistic Reasoning: A Signal Detection Analysis University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 Dissertations and Theses 2009 Dual-Process Theory and Syllogistic Reasoning: A Signal Detection Analysis

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Negations Without Not : Alternative forms of Negation and Contrast Classes in Conditional Inference. James Richard Hazlett Vance

Negations Without Not : Alternative forms of Negation and Contrast Classes in Conditional Inference. James Richard Hazlett Vance Negations Without Not : Alternative forms of Negation and Contrast Classes in Conditional Inference James Richard Hazlett Vance Doctor of Philosophy Birkbeck, University of London 2018 1 Declaration I

More information

Throwing the Baby Out With the Bathwater? The What Works Clearinghouse Criteria for Group Equivalence* A NIFDI White Paper.

Throwing the Baby Out With the Bathwater? The What Works Clearinghouse Criteria for Group Equivalence* A NIFDI White Paper. Throwing the Baby Out With the Bathwater? The What Works Clearinghouse Criteria for Group Equivalence* A NIFDI White Paper December 13, 2013 Jean Stockard Professor Emerita, University of Oregon Director

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants Repeated Measures ANOVA and Mixed Model ANOVA Comparing more than two measurements of the same or matched participants Data files Fatigue.sav MentalRotation.sav AttachAndSleep.sav Attitude.sav Homework:

More information

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value SPORTSCIENCE Perspectives / Research Resources A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value Will G Hopkins sportsci.org Sportscience 11,

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

10 Intraclass Correlations under the Mixed Factorial Design

10 Intraclass Correlations under the Mixed Factorial Design CHAPTER 1 Intraclass Correlations under the Mixed Factorial Design OBJECTIVE This chapter aims at presenting methods for analyzing intraclass correlation coefficients for reliability studies based on a

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Running head: How large denominators are leading to large errors 1

Running head: How large denominators are leading to large errors 1 Running head: How large denominators are leading to large errors 1 How large denominators are leading to large errors Nathan Thomas Kent State University How large denominators are leading to large errors

More information

Research Methods II, Spring Term Logic of repeated measures designs

Research Methods II, Spring Term Logic of repeated measures designs Research Methods II, Spring Term 2003 1 Logic of repeated measures designs Imagine you want to test the effect of drinking Fosters on the time it takes to respond to a light turning red. The independent

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Belief-based and analytic processing in transitive inference depends on premise integration difficulty

Belief-based and analytic processing in transitive inference depends on premise integration difficulty Memory & Cognition 2010, 38 (7), 928-940 doi:10.3758/mc.38.7.928 Belief-based and analytic processing in transitive inference depends on premise integration difficulty GLENDA ANDREWS Griffith University,

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Does pure water boil, when it s heated to 100 C? : The Associative Strength of Disabling Conditions in Conditional Reasoning

Does pure water boil, when it s heated to 100 C? : The Associative Strength of Disabling Conditions in Conditional Reasoning Does pure water boil, when it s heated to 100 C? : The Associative Strength of Disabling Conditions in Conditional Reasoning Wim De Neys (Wim.Deneys@psy.kuleuven.ac.be) Department of Psychology, K.U.Leuven,

More information

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins 1. The Title a. Is it clear and concise? b. Does it promise no more than the study can provide? INTRODUCTION 2. The Problem a. It is clearly

More information

FULL REPORT OF RESEARCH ACTIVITIES. Background

FULL REPORT OF RESEARCH ACTIVITIES. Background FULL REPORT OF RESEARCH ACTIVITIES Background There has been a recent upsurge of interest in individual differences in reasoning which has been well summarised by Stanovich & West (2000). The reason for

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

CFSD 21 st Century Learning Rubric Skill: Critical & Creative Thinking

CFSD 21 st Century Learning Rubric Skill: Critical & Creative Thinking Comparing Selects items that are inappropriate to the basic objective of the comparison. Selects characteristics that are trivial or do not address the basic objective of the comparison. Selects characteristics

More information

Working Memory Span and Everyday Conditional Reasoning: A Trend Analysis

Working Memory Span and Everyday Conditional Reasoning: A Trend Analysis Working Memory Span and Everyday Conditional Reasoning: A Trend Analysis Wim De Neys (Wim.Deneys@psy.kuleuven.ac.be) Walter Schaeken (Walter.Schaeken@psy.kuleuven.ac.be) Géry d Ydewalle (Géry.dYdewalle@psy.kuleuven.ac.be)

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Wason's Cards: What is Wrong?

Wason's Cards: What is Wrong? Wason's Cards: What is Wrong? Pei Wang Computer and Information Sciences, Temple University This paper proposes a new interpretation

More information

Different developmental patterns of simple deductive and probabilistic inferential reasoning

Different developmental patterns of simple deductive and probabilistic inferential reasoning Memory & Cognition 2008, 36 (6), 1066-1078 doi: 10.3758/MC.36.6.1066 Different developmental patterns of simple deductive and probabilistic inferential reasoning Henry Markovits Université du Québec à

More information

The Effect of response format on syllogistic reasoning

The Effect of response format on syllogistic reasoning Calvert Undergraduate Research Awards University Libraries Lance and Elena Calvert Award for Undergraduate Research 2011 The Effect of response format on syllogistic reasoning Adam S. Billman University

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Scale Invariance and Primacy and Recency Effects in an Absolute Identification Task

Scale Invariance and Primacy and Recency Effects in an Absolute Identification Task Neath, I., & Brown, G. D. A. (2005). Scale Invariance and Primacy and Recency Effects in an Absolute Identification Task. Memory Lab Technical Report 2005-01, Purdue University. Scale Invariance and Primacy

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Why Does Similarity Correlate With Inductive Strength?

Why Does Similarity Correlate With Inductive Strength? Why Does Similarity Correlate With Inductive Strength? Uri Hasson (uhasson@princeton.edu) Psychology Department, Princeton University Princeton, NJ 08540 USA Geoffrey P. Goodwin (ggoodwin@princeton.edu)

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

PSYCHOLOGY Vol. II - Experimentation in Psychology-Rationale, Concepts and Issues - Siu L. Chow

PSYCHOLOGY Vol. II - Experimentation in Psychology-Rationale, Concepts and Issues - Siu L. Chow EXPERIMENTATION IN PSYCHOLOGY RATIONALE, CONCEPTS, AND ISSUES Siu L. Chow Department of Psychology, University of Regina, Canada Keywords: conditional syllogism, control, criterion of falsification, experiment,

More information

J. V. Oakhill a & P. N. Johnson-Laird a a MRC Perceptual and Cognitive Performance Unit,

J. V. Oakhill a & P. N. Johnson-Laird a a MRC Perceptual and Cognitive Performance Unit, This article was downloaded by: [Princeton University] On: 24 February 2013, At: 11:51 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) 1 VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) Thank you for providing us with the opportunity to revise our paper. We have revised the manuscript according to the editors and

More information

Matching bias in the selection task is not eliminated by explicit negations

Matching bias in the selection task is not eliminated by explicit negations THINKING & REASONING, 2008, 14 (3), 281 303 Matching bias in the selection task is not eliminated by explicit negations Christoph Stahl and Karl Christoph Klauer University of Freiburg, Germany Edgar Erdfelder

More information

PSY 3393 Experimental Projects Spring 2008

PSY 3393 Experimental Projects Spring 2008 PSY 3393 Experimental Projects Spring 2008 Dr. Peter Assmann Assignment: journal article report Find an article on a topic of special interest to you from any peer-reviewed journal in Psychology, Neuroscience

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

UNEQUAL CELL SIZES DO MATTER

UNEQUAL CELL SIZES DO MATTER 1 of 7 1/12/2010 11:26 AM UNEQUAL CELL SIZES DO MATTER David C. Howell Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Journal of Experimental Psychology: Learning, Memory, and Cognition

Journal of Experimental Psychology: Learning, Memory, and Cognition Journal of Experimental Psychology: Learning, Memory, and Cognition Conflict and Bias in Heuristic Judgment Sudeep Bhatia Online First Publication, September 29, 2016. http://dx.doi.org/10.1037/xlm0000307

More information

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels; 1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test

More information

Does everyone love everyone? The psychology of iterative reasoning

Does everyone love everyone? The psychology of iterative reasoning THINKING & REASONING, 2004, 10 (1), 31 53 Does everyone love everyone? The psychology of iterative reasoning Paolo Cherubini Universita` di Milano-Bicocca, Italy P. N. Johnson-Laird Princeton University,

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Chapter-2 RESEARCH DESIGN

Chapter-2 RESEARCH DESIGN Chapter-2 RESEARCH DESIGN 33 2.1 Introduction to Research Methodology: The general meaning of research is the search for knowledge. Research is also defined as a careful investigation or inquiry, especially

More information

Why do Psychologists Perform Research?

Why do Psychologists Perform Research? PSY 102 1 PSY 102 Understanding and Thinking Critically About Psychological Research Thinking critically about research means knowing the right questions to ask to assess the validity or accuracy of a

More information

Two-Way Independent Samples ANOVA with SPSS

Two-Way Independent Samples ANOVA with SPSS Two-Way Independent Samples ANOVA with SPSS Obtain the file ANOVA.SAV from my SPSS Data page. The data are those that appear in Table 17-3 of Howell s Fundamental statistics for the behavioral sciences

More information

Dual-Process Theories: Questions and Outstanding Issues. Valerie A. Thompson University of Saskatchewan

Dual-Process Theories: Questions and Outstanding Issues. Valerie A. Thompson University of Saskatchewan Dual-Process Theories: Questions and Outstanding Issues Valerie A. Thompson University of Saskatchewan Outline Why do we need Dual Process Theories? Integrate with each other, cognitive theories Integration

More information

Sequential similarity and comparison effects in category learning

Sequential similarity and comparison effects in category learning Sequential similarity and comparison effects in category learning Paulo F. Carvalho (pcarvalh@indiana.edu) Department of Psychological and Brain Sciences, Indiana University 1101 East Tenth Street Bloomington,

More information

Metacognition and abstract reasoning

Metacognition and abstract reasoning Mem Cogn (2015) 43:681 693 DOI 10.3758/s13421-014-0488-9 Metacognition and abstract reasoning Henry Markovits & Valerie A. Thompson & Janie Brisson Published online: 22 November 2014 # Psychonomic Society,

More information

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Midterm Exam MMI 409 Spring 2009 Gordon Bleil Midterm Exam MMI 409 Spring 2009 Gordon Bleil Table of contents: (Hyperlinked to problem sections) Problem 1 Hypothesis Tests Results Inferences Problem 2 Hypothesis Tests Results Inferences Problem 3

More information

A Verbal Reasoning Theory for Categorical Syllogisms

A Verbal Reasoning Theory for Categorical Syllogisms A Verbal Reasoning Theory for Categorical Syllogisms Thad A. Polk and Alien Newell 01July9210:14 DRAFT: Please do not quote or distribute Introduction Human beings are constantly faced with the problem

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation.

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation. Measures of Dispersion Range Variance Standard deviation Range The numerical difference between the highest and lowest scores in a distribution It describes the overall spread between the highest and lowest

More information

Generalization and Theory-Building in Software Engineering Research

Generalization and Theory-Building in Software Engineering Research Generalization and Theory-Building in Software Engineering Research Magne Jørgensen, Dag Sjøberg Simula Research Laboratory {magne.jorgensen, dagsj}@simula.no Abstract The main purpose of this paper is

More information

Chapter 02. Basic Research Methodology

Chapter 02. Basic Research Methodology Chapter 02 Basic Research Methodology Definition RESEARCH Research is a quest for knowledge through diligent search or investigation or experimentation aimed at the discovery and interpretation of new

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1 Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1 Study identification Including author, title, reference, year of publication Guideline topic: Checklist completed

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Guidelines for reviewers

Guidelines for reviewers Guidelines for reviewers Registered Reports are a form of empirical article in which the methods and proposed analyses are pre-registered and reviewed prior to research being conducted. This format of

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

Reasoning From Double Conditionals: The Effects of Logical Structure and Believability

Reasoning From Double Conditionals: The Effects of Logical Structure and Believability THINKING AND REASONING, 1998, 4 (2), 97 122 97 Reasoning From Double Conditionals: The Effects of Logical Structure and Believability Carlos Santamaría Universidad de La Laguna, Tenerife, Spain Juan A.

More information

Analysis of data in within subjects designs. Analysis of data in between-subjects designs

Analysis of data in within subjects designs. Analysis of data in between-subjects designs Gavin-Ch-06.qxd 11/21/2007 2:30 PM Page 103 CHAPTER 6 SIMPLE EXPERIMENTAL DESIGNS: BEING WATCHED Contents Who is watching you? The analysis of data from experiments with two conditions The test Experiments

More information

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? The Economic and Social Review, Vol. 38, No. 2, Summer/Autumn, 2007, pp. 259 274 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? DAVID MADDEN University College Dublin Abstract:

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

PTHP 7101 Research 1 Chapter Assignments

PTHP 7101 Research 1 Chapter Assignments PTHP 7101 Research 1 Chapter Assignments INSTRUCTIONS: Go over the questions/pointers pertaining to the chapters and turn in a hard copy of your answers at the beginning of class (on the day that it is

More information

Saville Consulting Wave Professional Styles Handbook

Saville Consulting Wave Professional Styles Handbook Saville Consulting Wave Professional Styles Handbook PART 4: TECHNICAL Chapter 19: Reliability This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed

More information