Author's personal copy

Similar documents
Objectives. 6.3, 6.4 Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Child attention to pain and pain tolerance are dependent upon anxiety and attention

Decision Analysis Rates, Proportions, and Odds Decision Table Statistics Receiver Operating Characteristic (ROC) Analysis

Do People s First Names Match Their Faces?

Cocktail party listening in a dynamic multitalker environment

Chapter 4: One Compartment Open Model: Intravenous Bolus Administration

Regret theory and risk attitudes

Comparative judgments of animal intelligence and pleasantness

Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

Cognitive Load and Analogy-making in Children: Explaining an Unexpected Interaction

The Application of a Cognitive Diagnosis Model via an. Analysis of a Large-Scale Assessment and a. Computerized Adaptive Testing Administration

The duration of the attentional blink in natural scenes depends on stimulus category

State-Trace Analysis of the Face Inversion Effect

A Note on False Positives and Power in G 3 E Modelling of Twin Data

Reinforcing Visual Grouping Cues to Communicate Complex Informational Structure

Online publication date: 01 October 2010

carinzz prophylactic regimens

Remaining Useful Life Prediction of Rolling Element Bearings Based On Health State Assessment

Bayesian design using adult data to augment pediatric trials

Linear Theory, Dimensional Theory, and the Face-Inversion Effect

SIMULATIONS OF ERROR PROPAGATION FOR PRIORITIZING DATA ACCURACY IMPROVEMENT EFFORTS (Research-in-progress)

Prefrontal cortex fmri signal changes are correlated with working memory load

Transitive Relations Cause Indirect Association Formation between Concepts. Yoav Bar-Anan and Brian A. Nosek. University of Virginia

Estimating shared copy number aberrations for array CGH data: the linear-median method

Risk and Rationality: Uncovering Heterogeneity in Probability Distortion

The vignette, task, requirement, and option (VITRO) analyses approach to operational concept development

Automatic System for Retinal Disease Screening

Annie Quick and Saamah Abdallah, New Economics Foundation

The meaning of Beta: background and applicability of the target reliability index for normal conditions to structural fire engineering

Presymptomatic Risk Assessment for Chronic Non- Communicable Diseases

Severe Psychiatric Disorders in Mid-Life and Risk of Dementia in Late- Life (Age Years): A Population Based Case-Control Study

Brain regions supporting intentional and incidental memory: a PET study

Shape Analysis of the Left Ventricular Endocardial Surface and Its Application in Detecting Coronary Artery Disease

Patterns of Inheritance

Randomized controlled trials: who fails run-in?

The dynamic evolution of the power exponent in a universal growth model of tumors

Sampling methods Simple random samples (used to avoid a bias in the sample)

The majority of ad exposure occurs under incidental conditions where

Merging of Experimental and Simulated Data Sets with a Bayesian Technique in the Context of POD Curves Determination

Relating mean blood glucose and glucose variability to the risk of multiple episodes of hypoglycaemia in type 1 diabetes

IN a recent article (Iwasa and Pomiankowski 1999),

Theory of mind in the brain. Evidence from a PET scan study of Asperger syndrome

Chapter 2. Horm Behav Jul;60(2):

Supplementary materials for: Executive control processes underlying multi- item working memory

Origins of Hereditary Science

Dental X-rays and Risk of Meningioma: Anatomy of a Case-Control Study

Khalida Ismail, 1 Andy Sloggett, 2 and Bianca De Stavola 3

Mendel Laid the Groundwork for Genetics Traits Genetics Gregor Mendel Mendel s Data Revealed Patterns of Inheritance Experimental Design purebred

R programs for splitting abridged fertility data into a fine grid of ages using the quadratic optimization method

Introducing Two-Way and Three-Way Interactions into the Cox Proportional Hazards Model Using SAS

This is an author-deposited version published in: Eprints ID: 15989

PROSTATE BRACHYTHERAPY CAN BE PERFORMED IN SELECTED PATIENTS AFTER TRANSURETHRAL RESECTION OF THE PROSTATE

Title: Correlates of quality of life of overweight and obese patients: a pharmacy-based cross-sectional survey

Money and Mimicry: When Being Mimicked Makes People Feel Threatened. Jia (Elke) Liu. University of Groningen. Kathleen D. Vohs

CRIKEY - A Temporal Planner Looking at the Integration of Scheduling and Planning

Unit 1 Exploring and Understanding Data

Detection Theory: Sensitivity and Response Bias

STAT 200. Guided Exercise 7

Differences in the local and national prevalences of chronic kidney disease based on annual health check program data

adolescents; children; CONSORT; pain; randomized controlled trial.

TO help 25.8 million Americans [1] with diabetes, a growing

Does Job Strain Increase the Risk for Coronary Heart Disease or Death in Men and Women?

REI MONDEN, 1 STIJN DE VOS, 1 RICHARD MOREY, 2 ERIC-JAN WAGENMAKERS, 3 PETER DE JONGE 1 & ANNELIEKE M. ROEST 1

Magdalena Wojtczak, Anna C. Schroder, Ying-Yee Kong, and David A. Nelson

Polymorbidity in diabetes in older people: consequences for care and vocational training

Comparative analysis of fetal electrocardiogram (ECG) extraction techniques using system simulation

Treating Patients with HIV and Hepatitis B and C Infections: Croatian Dental Students Knowledge, Attitudes, and Risk Perceptions

A modular neural-network model of the basal ganglia s role in learning and selecting motor behaviours

Acute Myocardial Infarction Mortality in Comparison with Lung and Bladder Cancer Mortality in Arsenic-exposed Region II of Chile from 1950 to 2000

The effect of presentation time and working memory load on emotion recognition.

University of Groningen. Depression vulnerability: is it really what you think? van Rijsbergen, Gerard

Research. Dental Hygienist Attitudes toward Providing Care for the Underserved Population. Introduction. Abstract. Lynn A.

Research Article Deep Learning Based Syndrome Diagnosis of Chronic Gastritis

University of Groningen. Variations in working memory capacity Gulbinaite, Rasa

Syncope in Children and Adolescents

COPD is a common disease. Over the prolonged, Pneumonic vs Nonpneumonic Acute Exacerbations of COPD*

Performance of the Vocal Source Related Features from the Linear Prediction Residual Signal in Speech Emotion Recognition

Developmental enamel defects and their impact on child oral health-related quality of life

Evaluation of EEG features during Overt Visual Attention during Neurofeedback Game

Identification and low-complexity regime-switching insulin control of type I diabetic patients

Acute Myocardial Infarction Mortality in Comparison with Lung and Bladder Cancer Mortality in Arsenic-exposed Region II of Chile from 1950 to 2000

Journal of the American College of Cardiology Vol. 38, No. 1, by the American College of Cardiology ISSN /01/$20.

Coupling between line tension and domain contact angle in heterogeneous membranes

Numerous internal fixation techniques have been

King s Research Portal

A model of HIV drug resistance driven by heterogeneities in host immunity and adherence patterns

Additive Beneficial Effects of Beta-Blockers to Angiotensin-Converting Enzyme Inhibitors in the Survival and Ventricular Enlargement (SAVE) Study

Getting to Goal: Managed Care Strategies for Children, Adolescents, and Adults With ADHD

Ovarian Cancer Survival McGuire et al. Survival in Epithelial Ovarian Cancer Patients with Prior Breast Cancer

Thrombocytopenia After Aortic Valve Replacement With Freedom Solo Bioprosthesis: A Propensity Study

Effects of involuntary auditory attention on visual task performance and brain activity

Min Kyung Hyun. 1. Introduction. 2. Methods

Prognostic Value of Exercise Thallium-201 Imaging Performed Within 2 Years of Coronary Artery Bypass Graft Surgery

Hot Flashes and Related Outcomes in Breast Cancer Survivors and Matched Comparison Women

Vision Research 50 (2010) Contents lists available at ScienceDirect. Vision Research. journal homepage:

Gender Differences and Predictors of Work Hours in a Sample of Ontario Dentists. Cite this as: J Can Dent Assoc 2016;82:g26

THE QUANTITATIVE GENETICS OF HETEROSIS. Kendall R. Lamkey and Jode W. Edwards INTRODUCTION

Classical Psychophysical Methods (cont.)

Lecture 2: Linear vs. Branching time. Temporal Logics: CTL, CTL*. CTL model checking algorithm. Counter-example generation.

The Mississippi Social Climate of Tobacco Control,

Transcription:

Vision Research 48 (2008) 1837 1851 Contents lists available at ScienceDirect Vision Research journal homeage: www.elsevier.com/locate/visres Bias and sensitivity in two-interval forced choice rocedures: Tests of the difference model Yaffa Yeshurun a, *, Marisa Carrasco b, Laurence T. Maloney b a Deartment of Psychology, University of Haifa, 31905 Haifa, Israel b Deartment of Psychology, Center for Neural Science, New York University, 6 Washington Place, New York, NY 10003, USA article info abstract Article history: Received 23 Aril 2007 Received in revised form 24 Aril 2008 Keywords: Psychohysics Threshold estimation 2-IFC Bias We assess four common claims concerning the two-interval forced choice (2-IFC) rocedure and the standard Difference Model of 2-IFC erformance. The first two are (1) that it is unbiased and (2) that the structure of the 2-IFC task does not in itself alter sensitivity. The remaining two concern a claimed 2 enhancement in sensitivity in 2-IFC relative to that measured in a Yes No task. We review relevant ast research and re-analyze seventeen exeriments from revious studies across three laboratories. We then reort an exeriment comaring 2-IFC erformance with erformance in a second task designed to elucidate observers decision rocesses. This second task is simly two successive Yes No signal detection tasks with the same timing as in the 2-IFC exeriment. We find little evidence suorting the claims that 2-IFC is unbiased and that it does not alter sensitivity and we also reject the two claims associated with the Difference Model as a model of erformance in our own exeriment. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction * Corresonding author. Fax: +972 4 8240966. E-mail address: yeshurun@research.haifa.ac.il (Y. Yeshurun). Human erformance in sychohysical exeriments deends on both sensory and decisional rocesses. To reliably measure the sensitivity of the sensory rocesses one should ensure that decisional rocesses and sychohysical methods do not distort sensitivity measurements. A widely used method for assessing sensitivity is the two-interval forced choice (2-IFC) aradigm. In a 2-IFC task, a single exerimental trial consists of two temoral intervals. The signal is resented in one and only one of the intervals and the observer is required to reort the interval (first vs. second) in which the signal was resented. The 2-IFC rocedure is emloyed in a wide range of alications in vision, audition, cognition and other fields. Many users believe that (1) it is unbiased ( 1 ¼ 2 where i is the robability of a correct resonse when the signal is in the i-th interval), (2) that the rocedure itself does not affect the sensitivity of the observer, and that (3) there is a 2 enhancement of the signal consistent with a articular model of 2-IFC erformance, the Difference Model, which we resent below. Over the course of this article we will formulate these claims recisely (slitting the third claim into two claims) and review revious work relevant to each claim. We will reort re-analyzes of data from ublished exeriments to evaluate the first claim and we will reort an exeriment designed to test the latter claims. To summarize our conclusions, we find little suort for any of these claims. In the last art of the aer, we discuss the imlications for use of 2-IFC and other rocedures. Claim 1: 1 ¼ 2 The 2-IFC aradigm is widely used because it is considered to reduce or eliminate bias. Green and Swets (1973), for examle, state that... the rincial value of the forced choice task is that it ractically eliminates the need to deal with the observer s decision criterion. Since the errors in a forced choice, unlike the errors in a Yes No task, do not differ intrinsically in cost, observers find it more natural to maintain the symmetrical criterion. (. 108). Egan (1975,. 44ff) gives a succinct summary of all of the assumtions needed to derive the standard Difference Model of 2-IFC erformance (which we resent below) and the claim that erformance will be unbiased. However, he does not assess whether the assumtions are likely to be satisfied in any articular alication. More recent texts are cautious. For instance, as we will review bellow, both Wickens (2002,. 93ff) and Macmillan and Creelman (2005,. 175ff) discuss ossible failures of the Difference Model or its assumtions, but nevertheless recommend the use of the 2-IFC rocedure, stating that... the rocedure discourages bias. (. 179). In this article, we first reort the results of re-analyzes of ublished data from seventeen 2-IFC exeriments for which we were able to recover data for erformance in both intervals searately. The exeriments were carried out in three searate laboratories and were used to measure visual sensitivity of very different kinds 0042-6989/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.visres.2008.05.008

1838 Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 of tasks. In conclusion, we find large interval biases in all of these studies. Across studies we found biases that favored either the first or the second interval. We reort all of the data we could obtain. Moreover, we found that many of our colleagues could not rovide us with data to analyze for ossible biases because they do not searately record resonses in both intervals of a 2-IFC trials. Consequently, there is no way to determine whether or not large interval biases were resent in their exeriments. The results of this initial analysis led us to consider other claims concerning 2-IFC commonly made. In addition to the issue of bias, a second issue surrounding the 2-IFC rocedure concerns sensitivity: what exactly do 2-IFC exeriments measure? For many alications, the exerimenter wants only indices of sensitivity that can be comared across two or more conditions within a single exeriment. However, if we wish to translate 2-IFC erformance into a measure of sensitivity that can be comared to measures of sensitivity derived from other sychohysical rocedures, we need a credible model of what the observer is doing in 2-IFC tasks. Accordingly, we briefly review the literature testing the standard Difference Model of 2-IFC and find that it is inconclusive: while there is little reason to accet the model as an accurate model of what observers do in 2-IFC exeriments, revious work also gives us no conclusive ground to reject the Difference Model. Accordingly, we erformed an exeriment where we comared erformance in a 2-IFC task with erformance in a second task comosed of two indeendent Yes No tasks with the same stimuli and timing as the 2-IFC task: stimuli can aear in either, both or neither interval. The results of this exeriment allow us to reject the Difference Model and also give us some insight into the observer s decision rocesses in two-interval exeriments. Our conclusions are two. First, 2-IFC in many alications is not bias free or aroximately so. Second, based on revious work and the exeriment reorted here, the Difference Model of 2-IFC erformance is not aroriate for many alications. As a field, we currently do not have a credible rocess model of what observers do in sychohysical exeriments that involve comarison across time and we have no basis to generalize exerience with one sort of stimulus to exeriments with different stimuli. We recommend that exerimenters use 2-IFC methods with caution, if at all, recording data by interval, testing for bias, and reorting it when found. Moreover, we stress that if a difference is found between the two intervals, it is necessary to analyze the effect of each variable or ossible variable interactions in each interval. Finally, before any claim could be made based on the Difference Model, one needs to test whether the assumtions of the model hold under the secific exerimental conditions of the study. 2. Re-analysis of revious 2-IFC studies To test whether the 2-IFC aradigm is bias free, or nearly so, we re-analyzed several sets of data from seventeen exeriments that have emloyed a 2-IFC aradigm. These different studies are all ublished and a detailed descrition of their goals, methods, and findings can be found in the referenced aers. A short summary of their methods, including a descrition of stimuli, is given in Aendix A. Here, we are rimarily interested in the roblem raised by Klein (2001). Does erformance as measured by roortion-correct differ significantly in the two intervals? For all data sets, we erformed nested hyothesis tests (Mood, Graybill, & Boes, 1974,. 441) to measure the difference in erformance in Interval 1 vs. Interval 2. We tested the hyothesis H 0 : 1 ¼ 2, where 1 is the robability of a correct resonse when the signal is in the first interval, and 2 is the robability of a correct resonse when the signal is in the second interval, vs. the alternative H 1 : 1 6¼ 2. The details of the test are described in Aendix B.1. We refer to a difference in robability of correct resonse in the two intervals as an interval bias for brevity. The results of the hyothesis tests include -values that exress the consistency between the observed data and the hyothesis under test. Under the null hyothesis, we exect that will itself be a random variable distributed uniformly across the interval [0,1]. Under the alternative hyothesis, the values of will tend to be small, and values of below a conventional level (e.g. <:05) are taken as evidence to reject the null hyothesis. Here, we will reort the exact -values of tests. These values have a wide range and it is hard to tell <:00001 from <:000001 at a glance. It is convenient to use C ¼b log 10 c as an index of the evidence against the null hyothesis in the data. The brackets in the definition imly that we round the value of C down to the nearest integer. Thus, if ¼ 10 6, C ¼ 6, and if ¼ 10 3, C ¼ 3. The conventional level of <:05 translates to C > 1:3 and in many of the lots we will reort values of C between 1.3 and 2 as well. We reort values of C > 6 as 6. Note that any value of C P 1:3 would usually lead to rejection of the null hyothesis H 0 : 1 ¼ 2 in a single null hyothesis test. Note that the C measure is not a measure of magnitude of the effect (or any information associated with effect size or Tye II Error). It is simly Tye I Error, transformed logarithmically and reversed in sign. In Fig. 1A, we lot 2 vs. 1 for seven observers in each of four exerimental conditions reorted in a texture discrimination study by Wolfson and Landy (1998; see descrition in Aendix A.1). The value of C is encoded as the radius of the symbol lotted at each oint when it is in the range 1.3 6 as shown in the legend. When C < 1:3 (and therefore >:05) the oint is lotted as a single dot. Note first that there are 20 out of 28 oints for which C P 1:3 ( 6 :05), there are 17 oints with C P 3 corresonding to <:001 in the nested hyothesis test, and 12 oints with C P 6, corresonding to <:000001 in the nested hyothesis test. The distance from the diagonal line is an index of the magnitude of the difference 2 1. It is clear that there are large, significant deviations from the hyothesis that 1 ¼ 2 for the majority of observer-conditions in this exeriment. In Fig. 1B, we lot 2 vs. 1 for three observers in each of nine exerimental conditions reorted in a texture segmentation study by Wolfson and Landy (1995; see descrition in Aendix A.2). The lotting format is the same as that of Fig. 1A. Again, it is clear that there are large, extremely significant deviations from the hyothesis that 1 ¼ 2 for the majority of the observer-conditions. We note that the differences favored the first interval in Fig. 1A but favor the second in Fig. 1B. Since there is good agreement among observers within each exerimental condition it seems unlikely that the observed interval biases are based on idiosyncratic references by observers for one interval over the other. We note that trained observers articiated in both studies. A ossible exlanation for the observed difference in direction of the interval bias in Wolfson and Landy (1995), (1998) is that the temoral sacing between the two resentations of ossible targets is too short and one interval is somehow masking the other (Alcalá-Quintana & García-Pérez, 2005). In Fig. 2, we resent the results for 15 observers for two ISI (inter-stimulus interval) conditions in a single exeriment involving satial frequency discrimination (McIntosh et al., 1999; see descrition in Aendix A.3). The ISI was either 500 ms (red symbols) or 4000 ms (blue symbols). The lot formats are identical to those of Fig. 1A and B. Again, there are marked and highly significant interval biases for several observers and both ISIs. Note that in both conditions some observers do significantly better in Interval 1 than in Interval 2 and some do significantly better in Interval 2 than in Interval 1, but the former is more frequent. In Fig. 3, we lot the value log 10 4000 vs. log 10 500 (the C values without truncation or rounding) to see

Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 1839 Fig. 1. The results of the re-analysis erformed on: (A) Data from a texture discrimination study by Wolfson and Landy (1998). (B) Data from a texture segmentation study by Wolfson and Landy (1995). The distance of each data oint from the diagonal line is an index of the difference between Percent Correct in the second interval and Percent Correct in the first interval ( 2 1 ). The radius of the symbol lotted at each oint reresents the -value of the hyothesis tests [H 0 : 1 ¼ 2 ; H 1 : 1 6¼ 2 ] when it is in the range 0.05 0.000001 (i.e., when log 10 is in the range 1.3 6 as shown in the legend). When >.05 the oint is lotted as a single dot. It is clear that there are large, extremely significant deviations from the hyothesis that 1 ¼ 2 for the majority of observer-conditions in these studies. whether observers who had a bias of erformance with one ISI also had a bias with the other. The Pearson s roduct moment correlation between the two measures is ^q ¼ 0:44 (if we omit the two evident outliers near the to of the lot, it rises to ^q ¼ 0:51). Thus, while the same observers tended to have biases in both conditions, the correlation is modest and accounts for less than 20% of the variance. To summarize, varying the ISI from 500 to 4000 ms seemed to have no effect on the resence or absence of interval biases in a substantial number of observers. Last of all, we summarize results for five visual search exeriments with a total of 56 observers taken from Carrasco and Yeshurun (1998; see descrition in Aendix A.4). The results are shown in Fig. 4 for all of the different exeriments combined. The different exeriments are coded by color as exlained in the figure cation. There are fewer violations roortionately, ossibly because the accuracy in these exeriments was relatively high to allow collection of reliable measurements of RT, but still C P 1:3ð <:05Þ for more than 40% of observer-exeriments (24 out of 56). Further analysis of the two attentional conditions included in these exeriments (cued vs. neutral trials; see Aendix A.4 for details) revealed biases in erformance even on trials in which a satial cue attracted satial attention to the target location rior to the resentation of the search dislay. That is, the asymmetrical erformance was resent regardless of whether attention was focused in advance on the relevant location. The data sets resented were those for which we could obtain searate data for the first and second intervals. There were large interval biases in all of the studies considered. Our results suort Klein s conjecture: there can be large differences in roortion-correct in the two intervals of a 2-IFC exeriment. Interestingly, the resence or absence of these biases does not deend in any obvious way on: (a) the tye of exeriment or the comlexity level of the dislay (e.g., a single sinusoidal grating in McIntosh et al., 1999 vs. a comlex multi-element dislay of various color shae combinations in Carrasco & Yeshurun, 1998); (b) whether or not satial attention is focused on the target location (cued vs. neutral trials in Carrasco & Yeshurun, 1998); (c) the duration of the ISI between the two resentations (500-ms vs. 4000-ms in McIntosh et al., 1999); or (d) whether or not the observers are exerienced (e.g., highly exerienced observers in Wolfson & Landy, 1995 vs. naïve, inexerienced observers in Carrasco & Yeshurun, 1998). It is sometimes claimed that exerienced sychohysical observers show little or no bias, that is, bias is due to lack of exerience: Occasionally a subject will show a reference for one or another interval; such a reference is usually eliminated with further ractice... (Green & Swets, 1973,. 108). Others make similar observations (Köhler, 1923; Needham, 1934) in discussing interval biases. We find no evidence to suort these claims. Finally, based on Fig. 3, some observers tend to have biases across conditions, but the correlation is modest. Thus, this finding of an asymmetrical erformance shows no obvious atterned deendence on factors such as attentional state, comlexity, ISI or ractice. The reader s initial reaction might be to attribute these biases to a bias in guessing : when observers have no idea whether the signal was in the first or second interval, they guess: some observers stereotyically select the first interval in guessing, some the second. A first difficulty with this exlanation is that observers in Wolfson and Landy (1995) and in Wolfson and Landy (1998) tend to have biases in the same direction within each exeriment. It is difficult to see how observers coordinated their stereotyical references. Moreover, there is a major theoretical difficulty with this exlanation. We remind the reader that, in the Difference Model of 2-IFC erformance, there is no role assigned to guessing, and for that reason it is often described as criterion-free. Consequently, whatever the origin of the bias, it corresonds to a failure of this standard Difference Model which we review next. 3. The difference model All of the models we consider are examles of otimal Bayesian classifiers which, subject to constraints imosed on the observer,

1840 Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 Fig. 2. The results of the re-analysis erformed on data from a satial frequency discrimination study by McIntosh et al (1999). The distance of each data oint from the diagonal line is an index of the difference: Percent Correct in second interval Percent Correct in first interval ( 2 1 ). The radius of the symbol lotted at each oint reresents the -value of the hyothesis tests [H 0 : 1 ¼ 2 ; H 1 : 1 6¼ 2 ] when it is in the range 0.05 0.000001 (i.e., log 10 is in the range 1.3 6 as shown in the legend). When >.05 the oint is lotted as a single dot. The red symbols deict data from the condition in which the ISI between the intervals was 500 ms and blue symbols deict data from the condition in which the ISI was 4000 ms. Here too, there are marked and highly significant differences in erformance between the two intervals for several observers and both ISI conditions. Fig. 3. A lot of log 10 by ISI condition across observers (McIntosh et al, 1999), is the -value of the hyothesis tests. The Pearson s roduct moment correlation between the two measures is ^q ¼ 0:44. This modest correlation can account for less than 20% of the variance, and it indicates that, for a substantial number of observers, varying the ISI from 500 to 4000 ms seemed to have no effect on the resence or absence of interval biases. maximize the exected roortion of correct resonses (Duda, Hart, & Stork, 2000, cha. 2). In the standard Difference Model (Egan, 1975,. 44ff; Macmillan & Creelman, 2005,. 165ff; Wickens, 2002,. 93ff), the observer records a measure of sensory activity, S 1, in the first interval and comares it to a measure of sensory activity, S 2, in the second. This model is sometimes referred to as the trace model (e.g., Berliner & Durlach, 1973; Durlach & Braitin, 1969). These measures S 1 ; S 2 are tyically assumed to be statistically indeendent and identically distributed. They may be multivariate, corresonding to the activities of many visual mechanisms. If the distributions of the two measures are known, the otimal Bayesian classifier simly comutes the osterior robability that the observed attern of activity is due to the resence of a signal in the first interval, taking into account the rior robabilities that signals occur in each interval. If this osterior robability is greater than.5, the observer resonds first interval, otherwise, second. The rule just described is otimal in the sense that the observer has the highest ossible exected rate of correct resonse (Duda et al., 2000, cha. 2). In the secial case where the random variables S 1 ; S 2 are both univariate Gaussians 1 with mean d 0 > 0 and variance 1 when the signal is resent in the corresonding interval and mean 0 and variance 1 when the signal is absent, then we can derive a simle form for the decision rule. In Fig. 5A, for examle, we show a hyothetical distribution for activity in a task where the sensory signal in each interval is unidimensional. The distribution is bimodal, one mode corresonding to the signal in the first interval, the second to the signal in the second interval. Each mode is bivariate Gaussian and we 1 See Duda et al. (2000) for the general case.

Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 1841 Fig. 4. The results of the re-analysis erformed on data from five visual search exeriments erformed by Carrasco and Yeshurun (1998). The different exeriments are coded by color as follows: Magenta Orientation; Green Color X Orientation; Blue Long Color X Orientation; Red Color X Shae; Cyan Ls (see Aendix A.4 for details regarding the different exeriments). The distance of each data oint from the diagonal line is an index of the difference: Percent Correct in second interval Percent Correct in first interval ( 2 1 ). The radius of the symbol lotted at each oint reresents the -value of the hyothesis tests [H 0 : 1 ¼ 2 ; H 1 : 1 6¼ 2 ] when it is in the range 0.05 0.000001 (i.e., log 10 is in the range 1.3 6 as shown in the legend). When >.05 the oint is lotted as a single dot. Significant ( <.05) erformance biases were found for 24 out of the 56 observers. These results indicate that this bias is resent regardless of task comlexity. Another way to describe this comutation is that the observer comutes as decision variable the difference D ¼ S 2 S 1 and resonds Second Interval if D > 0 and otherwise First Interval. 2 Interreted this way, the observer s task is simly an ordinary equal-variance Gaussian signal detection judgment on the random variable D. Let d 0 YN be the difference in activity due to the resence of the signal in an interval (Fig. 5A). This notation needs exlanation. If the second interval was not resented to the observer and the observer were instructed to erform a Yes No task on the first interval (judging whether the signal were resent or absent in the first interval) then the observer s measured sensitivity corresonds to d 0 YN. Of course in the 2-IFC task the observer has additional information from the second interval and, intuitively we exect that the sensitivity measure for the task on the difference signal D ¼ S 1 S 2 should be greater than d 0 YN. The intuition is correct and, for the standard Difference Model, d 0 FC ¼ 0 2 d YN as we derive next. The variance of D ¼ S 2 S 1 is 2 since the variance of the difference of two indeendent random variables is the sum of the variances of the two random variables each of which is 1. When the signal is in the first interval the exected value of D is E½Dj1Š ¼ d 0 YN and when it is in the second interval, the exected value of D is E½Dj2Š ¼d 0 YN. The ratio of the signal range to standard deviation is then d 0 FC ¼ 2 d 0YN ¼ðE½Dj2Š E½Dj1ŠÞ= 2. Thus the effective sensitivity measurement d 0 FC from the 2-IFC task is 2 greater than the sensitivity d 0 YN that would be exected in an ordinary single-interval Yes No signal detection task using the same stimuli (Fig. 5B). We reviously cited one reason that Macmillan and Creelman (2005) give for recommending the use of 2-IFC (discourages bias). Another reason they give is that... the redicted 2 difference between Yes No and 2-AFC ermits measurement of sensitivity to smaller stimulus differences than may be ractical with Yes No...(. 179). The 2-IFC data we re-analyzed in the revious section rovide considerable evidence of interval bias in 2-IFC tasks. Thus we are left with the second advantage claimed by Macmillan and Creelman, which is based on the assumtion that 2-IFC d 0 FC is greater than Yes No d 0 YN. We can searate two claims concerning the values d 0 FC ; d0 YN. The first is that d 0 FC > d0 YN as resuosed by Macmillan and Creelman. The second is that d 0 FC ¼ 0 2 d YN (which Macmillan and Creelman do not assume). We will see below that the 2 factor is only correct if the observer s sensitivity is the same in the two intervals of the 2- IFC task. When the observer is more sensitive in one interval than the other we will relace this factor by a factor s > 1 to get the modified claim, d 0 FC ¼ sd0 1 where d0 1 is the observer s sensitivity in the first interval. In summary, we have derived the following claims: assume that its standard deviations are r x ¼ r Y ¼ 1 and that activity in the intervals is uncorrelated. On each trial, the observer records the two sensory signals and comares them, resonding Interval 1 recisely when S 1 > S 2. If we reresent the air ðs 1 ; S 2 Þ as a oint in the lot of Fig. 5A then the rule becomes resond Interval 1 recisely when ðs 1 ; S 2 Þ is below the diagonal dashed line shown. This model include the assumtion that sensitivity d 0 1 in the first interval is equal to that in the second d 0 2 and this is our second claim: Claim 2: d 0 1 ¼ d0 2. 2 The case S 1 ¼ S 2 occurs with robability 0 and it does not matter how the observer chooses to resond. Claim 3: d 0 FC ¼ sd0 1 Claim 4: d 0 FC > d0 YN where s > 1 can be comuted from the observer s sensitivities d 0 i ; i ¼ 1; 2 in the two intervals and s ¼ 0 2 when d 1 ¼ d0 2 ¼ d0 YN. Claim 4 is derived from the Difference Model but it can be considered aart from the Model as the claim that the observer can use the second interval of the 2-IFC rocedure to do better in detection. Claim 3 imlies Claim 4 but the converse does not hold. We emhasize that our notation is not standard (although it is close to that of Wickens 2002,. 100). Many researchers refer to use d 0 to denote only d 0 YN and assume that sensitivities measured using other methods such as 2-IFC will be converted to d 0 YN, i.e., the sensitivity measured with 2-IFC could be scaled by 1= 2 to get a measure of sensitivity comarable to d 0 YN. Of course, we cannot make this assumtion since the link between 2-IFC and Yes No measures of sensitivity (Claim 3) is in question. If we trust the Difference Model as a model for what observers do in 2-IFC exeriments, we can readily convert sensitivity measured with 2-IFC tasks to equivalent sensitivity measured with Yes No tasks and vice versa. Note that, with this model of the 2- IFC task, the ideal observer is never reduced to guessing. A side benefit of the Difference Model is that it can readily account for interval biases. In the standard model of 2-IFC the sensory difference is comared to criterion 0. If instead the observer assumes a criterion of c < 0 then she will be biased to resond in the first interval and will have a higher ercent correct on average in that interval. If the observer assumes a criterion of c > 0 then she will have a higher ercent correct on average in the second interval.

1842 Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 Fig. 5. (A) A hyothetical activity distribution for the otimal Bayesian classifier in the standard Difference Model. According to this model, on each trial, the observer records a measure of sensory activity S 1 in the first interval and comares it to a measure of sensory activity S 2 in the second interval, resonding Interval 1 when the former is larger than the latter and vice versa. If we reresent this air of activity measurements as a oint in this lot then the otimal decision rule, according to the model, becomes resond Interval 1 when this oint is below the 45 degree dashed line shown and resond interval 2 when the oint is above the diagonal. The two vicariate Gaussian distributions have standard deviation r ¼ 1. The vertical and horizontal dashed lines mark d 0 YN for the Yes No signal detection task involving only interval 1 or only interval 2. The dashed line connecting the centers of the two distributions corresonds to the sensitivity of the observer in the 2-IFC task and d 0 FC ¼ 0 2 d YN. (B) The to anel shows the sensory activity S i in a single-interval when a signal is resented and when no signal is resented. The distributions are normalized to have standard deviation r ¼ 1 and searation d 0 YN. The middle anel shows the distributions of D ¼ S2 S1 when the signal is in the first or second interval of a 2-IFC task. Note that the standard deviations of the distributions are now 0 2, the standard deviation of the difference and the searation is 2d YN. The bottom anel is the middle anel normalized so that the standard deviations of the two distributions are r ¼ 1. The searation is now d 0 FC ¼ 0 2 d YN. If we refer to the Difference Model with c ¼ 0 as the standard model of 2-IFC, we could refer to the Difference Model with ossibly non-zero choices of criterion as the Difference Model with Bias (Klein, 2001). Note however, that beyond the choice of criterion the observer has no freedom in how he or she executes the Difference Model. By choice of criterion, we can mimic any degree of bias across intervals but, of course, one cannot exlain interval biases in erformance by means of a model for which the only evidence is the resence of these same interval biases. If the observed interval biases are just the result of a shift of criterion in the standard Difference Model, then there is a standard correction rocedure (Green & Swets, 1973,. 410; Klein, 2001,. 1425) that can be used to correct for such biases and recover an accurate estimate of sensitivity. We illustrate this correction rocedure in Section 5. However, there is another, darker ossibility, noted first by Green and Swets (1973,. 408): the observed bias may reflect a difference in sensitivity in the two intervals. They and also Macmillan and Creelman (2005,. 176 177) adot this exlanation for observed interval biases: failures of Claim 1 are the result of failures of Claim 2. If this were the case, then the exerimenter is in the osition of trying to measure visual sensitivity using a rocedure, 2-IFC, that alters that sensitivity in at least one of the two intervals of resentation. This failure of the model would corresond to a violation of Claim 2. Klein (2001) has questioned whether 2-IFC rocedures correctly measure observer sensitivity. He describes his own exerience in one articular 2-IFC task and notes that an identical signal seemed more intense in one interval than in the other. He suggests that this exerience reflects a ossible bias in this task, and rooses a bias correction rocedure. His observation is likely an examle of the time error reorted by Köhler (1923; see also Needham, 1934). Observers in a task similar to 2-IFC tend to favor one interval over the other and this tendency deends on the time between the resentations of stimuli. Alcalá-Quintana and García-Pérez (2005) carried out an exeriment designed to test Klein s hyothesis but found only that, when

Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 1843 the time between intervals or between trials is too short, the two intervals can interact by masking each other, reducing overall sensitivity to the signal in one interval more than in the other. Were this interaction the only roblem for the interretation of 2-IFC results, it would be easily remedied by increasing the sacing between intervals. However, as demonstrated by the re-analysis of McIntosh et al. s (1999) data, increasing the ISI between intervals from 500 to 4000 ms did not revent the emergence of a bias (Fig. 2). An evident difficulty with 2-IFC designs is that the resentation of the first stimulus may affect the second and vice versa. Unless an exerimenter conducts analyses that rule out this ossibility, it is ossible that the d 0 measure that the exerimenter seeks to measure is really two d 0 measures, one for each interval. Note also that d 0 could differ in the two intervals because of a change in noise variance (Wickens, 2002,. 100ff) and not only because of a change in signal strength (Green & Swets, 1973,. 408). There are other ossible reasons why Claim 2 might fail. Another evident difficulty for the observer is that, in a 2-IFC exeriment, s/he must hold information from the first interval in memory before making a judgment based on information from both intervals (Wickelgren, 1968). This asymmetry could lead to an asymmetry in erformance in the two intervals. Nachmias (2006) comared erformance in various discrimination tasks and found that erformance was suerior when the standard stimulus is resented in the first interval rather than the second. He suggested that this asymmetry might imly the use of a memory-based virtual standard even with the 2-IFC aradigm, but it is not clear whether these findings carry any imlications for 2- IFC detection tasks. In the exeriment below, we will directly measure the sensitivity of the observer in the two intervals of a 2-IFC task to test Claim 2. Before we reort that exeriment, we discuss Claims 3 and 4 and review revious relevant work. 3.1. Claims 3 and 4 As noted above, for the Difference Model of Fig. 5, the observer s erformance can be reresented by a measured d 0 FC based on data and standard signal detection analyses. As derived above, this measured d 0 FC is a random variable that is an estimate not of the d0 YN 0 value associated with a single-interval Yes No task but of 2 d YN. When there are distinct d 0 values in the two intervals, denoted q d 0 i ; i ¼ 1; 2, we need to modify Claim 3: d0 FC ¼ ðd 0 1 Þ2 þðd 0 2 Þ2 ¼ ffiffi d 0 1 1 þ q 2 where q ¼ d 0 2 =d0 1 (Wickens, 2002,. 100ff). When Claim 2 holds, q ¼ 1 and the 2 rule described above results. Previous studies have comared erformance in 2-IFC tasks with erformance in Yes No tasks with identical stimuli. The attern of results in many of these studies did not suort the 2 rule. Some studies (e.g., Creelman & Macmillan, 1979; Jesteadt & Bilger, 1974; Leshowitz, 1969; Markowitz & Swets, 1967; Pynn, Braida, & Durlach, 1972; Schulman & Mitchell, 1966; Swets & Green, 1961; Viemeister, 1970; Watson, Kellogg, Kawanishi, & Lucas, 1973) found d 0 FC =d0 YN ratios that were larger than 2. For instance, Jesteadt and Bilger (1974) comared frequency discrimination and intensity discrimination with both 2-IFC and Yes No tasks and found that for both frequency and intensity discrimination erformance in the 2-IFC task is better than erformance in the Yes No task by a factor that is considerably larger than 2 (2.1 and 2.13, resectively). Similarly, Viemeister (1970) found for intensity discrimination a d 0 FC =d0 YN ratio of 1.91, and Creelman and Macmillan (1979) found for both frequency and hase discrimination that the ratio d 0 FC =d0 YN for 2-IFC and Yes No aradigms was about 2 0 rather than 2. Large d FC =d0 YN ratio 1.75 was also found for the detection of a brief 1000 Hz sinusoid in the resence of noise lus edestal (Leshowitz, 1969). Other studies (e.g., Leshowitz, 1969; Markowitz & Swets, 1967; Swets & Green, 1961) have found d 0 FC =d0 YN ratios that were smaller than 2. Markowitz and Swets (1967) comared erformance in auditory detection with the two aradigms and found a mean ratio of 1.15, with one of the observers being more sensitive in the one-interval exeriment. Similar results were found by Leshowitz (1969) when the task involved simle detection of sinusoid added to a noise that was either continuously resent or resented only during the observation interval (ratios of 1.19 and 1.35, resectively). As in Markowitz and Swets (1967) study, one of Leshowitz three observers actually erformed better in the one interval than the two intervals condition. Finally, Schulman and Mitchell (1966) also examined auditory signal detection with both aradigms, and found an averaged d 0 FC =d0 YN ratio of 1.46, which is close to 2, but the ratios of individual observers revealed considerable deviations from 2, ranging from 1.17 to 1.77. Many of the studies just cited lack estimates of the standard error of the ratios of d 0 values obtained in different tasks, making it difficult to assess the status of the Difference Model given these results. We note, for examle that values of s ¼ ffiffi 1 þ q 2 < 1 as reorted for one observer each in the studies of Leshowitz (1969) and Markowitz and Swets (1967) are inconsistent with the Difference Model. However, we cannot determine from the ublished work that the reorted estimates of s were significantly less than 1. If we accet that Claim 2 is false then any measured value of s > 1 can be exlained as the result of a hyothetical difference in sensitivity in the two intervals: d 0 1 6¼ d0 2. Such results do not challenge Claim 3, only Claim 2. Based on the literature just summarized, there is little reason to accet the Difference Model or any of the conclusions based on it and little reason to reject it. If the exerimenter does not run a control exeriment comaring d 0 FC and d0 YN, there is little reason to assume that sensitivity measured by 2-IFC can be used to estimate the sensitivity that would be measured with the same stimuli in a Yes No task. We conclude that, given only the results d 0 FC of a 2-IFC exeriment, the exerimenter knows little about d 0 YN. Moreover, the literature just summarized included exeriments with many different kinds of stimuli. It is lausible that the observed differences in estimates of s are in art due to the kinds of stimuli emloyed. Based on these results, there is little reason to think that the exerimenter can generalize exerience with one sort of stimulus to another. We return to this oint in Section 5. In the next section, we reort an exeriment designed to directly investigate how and whether observers guess and to test the Difference Model. We do not comare a Yes No task with a 2-IFC task based on the same stimuli. Instead we comare a 2- IFC task with a task in which the observer is simly required to carry out two Yes No tasks with the same timing as the 2-IFC task. In articular, the target stimulus could aear in either, both or neither interval. With this design we allow for the ossibility that sensitivity differs in the two intervals and measure d 0 i ; i ¼ 1; 2 searately for each interval under conditions that dulicate the conditions of the 2-IFC. With this design, we can estimate d 0 1 ; d0 2 directly and test Claim 2. With knowledge of d 0 1 ; d0 2 we estimate s and test Claim 3. 4. Exeriment Twenty-two observers articiated in an exeriment that included two sessions. The 2-way session emloyed a tyical 2-IFC aradigm. Each trial included two temoral intervals and the target aeared in one of the intervals. The 4-way session emloyed a novel adatation of that aradigm. A single trial also included two intervals, but in the 4-way condition the target could aear

1844 Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 in one of the intervals, in both intervals, or in neither of them. The four ossible outcomes were equally likely. We adot the convention that the four ossible kinds of trials in the 4-way condition are labeled YN, NY, NN, YY (i.e., the target aears in: 1st interval, 2nd interval, neither interval, or both intervals), to emhasize that the 4-way task is simly two successive Yes No signal detection tasks. On trials in the 4-way where YN or NY is resented (aroximately half of the trials), the observer s sensory state should be equivalent to that of corresonding trials in the 2-way task. However, in the 4- way task, the states nn the sensory activity in both intervals is below the criterion and yy the sensory activity in both intervals is above the criterion are now legitimate ossible outcomes of a trial and the observer is ermitted to resond nn and yy. The 4- way task allows us to test the standard model and also to learn something about the observer s decision rocess on those trials (YN and NY) where sensory events are identical to those in the 2-way task. To our knowledge this exeriment is the first to combine measurement of 2-IFC erformance with searate measurement of sensitivity in the two intervals of the task with the same timing and design. Had we simly comared 2-IFC erformance to Yes No erformance, we could not be sure that that the resence of the second interval had not altered measured 2-IFC (violating Claim 2) as several researchers have suggested (reviewed above). In Fig. 6, we show data sets for one of our observers to illustrate the design and its analysis. We have labeled the 2-way trials as YN and NY by analogy to the 4-way, and the observer s resonses as yn and ny. In the actual exeriment, observers resonded 1, 2, 3, 4 for yn, ny, nn, yy, resectively, in the 4-way condition and, in the 2- way condition, could only resond 1 (yn) or 2 (ny). Note that the first two rows of the 4-way table are the trials that could occur in the 2-way table as well, excet now the observer can legitimately reort that he or she is in state nn or yy. We wish to examine estimated d 0 FC in the 2-way session and d 0 i ; i ¼ 1; 2 in the 4-way session to test Claim 2 and Claim 3. We describe how we fit 2-way via the Difference Model in Aendix B.2. We describe how we fit 4-way data in Aendix B.3, also by an alication of the otimal Bayesian classifier. A 2-way design Resonse yn ny YN 168 36 204 NY 44 160 204 B yn 4-way design Resonse ny nn yy YN 76 1 12 13 102 NY 4 80 6 12 102 NN 19 10 66 6 101 YY 11 17 2 72 102 Results for a single subject (S02). Fig. 6. Design and samle data of: (A) The 2-way condition, emloying a tyical 2- IFC aradigm. In this condition two kinds of trials were ossible: YN in which the target was resent in the first interval, and NY in which the target was resent in the second interval. Accordingly, two kinds of resonse were ossible: yn and ny corresonding to the YN and NY trials. (B) The 4-way condition emloying a novel adatation of the 2-IFC aradigm, in which four kinds of trials were ossible: YN and NY which are identical to the 2-way condition, and YY in which the target was resent in both intervals, and NN in which the target was resent in neither of the intervals. Accordingly, four kinds of resonse were ossible: yn, ny, yy, and nn, corresonding to the YN, NY, YY, and NN trials. The resonse distribution across the different kinds of trials is reorted for one of our observers (SO2) to illustrate the design. 4.1. Methods 4.1.1. Observers Twenty-two naïve observers at the University of Haifa articiated in the exeriment; all had normal or corrected to normal vision. Each comleted the 2-way and 4-way conditions in searate sessions. For two of the observers we could not comute a stable estimate of d 0 in either the 2-way or 4-way condition because they made few or no errors in one or the other task. We excluded these observers from further analysis. 4.1.2. Stimuli The target was a single 8 cd Gabor atch with 30 orientation and 10% contrast resented at the center of the screen. The standard deviation of the Gaussian window of the Gabor was 2 cycles of the windowed grating. In the 2-way condition the target was resent in the first or second interval (YN, NY, resectively); each occurred at random on 50% of the trials. In the 4-way condition there were four tyes of trials, each at random occurring on 25% of the trials: YN the target was resent only in the first interval; NY the target was resent only in the second interval; YY the target was resent in both intervals; NN the target was resent in none of the intervals. 4.1.3. Procedure Each temoral interval began with a fixation dot resented for 600-ms at the center of the screen. On intervals that contained the target, a Gabor atch was then briefly resented at the center. The duration of the target resentation was set individually to kee erformance level at aroximately 85% correct. It varied between 15 and 105 ms (mean = 46 ms and SD = 26 ms). On intervals without a target, the screen was blank for the corresonding duration. An additional 300-ms blank screen served as the ISI between intervals. In the 2-way condition, the observers were asked to hit the 1 key for a YN trial and the 2 key for a NY trial. In the 4-way condition observers were asked to hit the 1 key for a YN trial; the 2 key for a NY trial; the 3 key for a NN trial; and the 4 key for a YY trial. The order of the different trial tyes was randomized within a condition. Each observer articiated in 816 trials (408 er condition). All observers erformed the 2-way condition first to ensure that whatever strategy they emloyed during the 4-way condition would not contaminate their erformance in the 2-way condition. 4.2. Analysis and results We first comared 1 (roortion-correct, first interval) and 2 (roortion-correct, second interval), just as we did for the data sets in the first section, to evaluate whether there were marked interval biases for any observers. Over a third of the observers (8/22) had marked 2-way biases, seven favoring the first interval. These data are resented in Fig. 7, in the same format as Fig. 1. Next, we estimated the d 0 i ; i ¼ 1; 2 values and criterion values in the two intervals from the 4-way by maximum likelihood methods as exlained in Aendix B.3. InFig. 8, we lot d 0 2 vs. d0 1. A leastsquare regression of d 0 2 on d0 1 (without intercet) results in a sloe of 0.908 which is significantly different from 1 ( <.05). The 95%- confidence interval for the sloe is (0.844, 0.973). The observers are, overall, slightly more sensitive in the first interval than the second. This outcome is the oosite of what Klein (2001) reorted based on his own exerience (discussed above). We reject Claim 2 but note that the difference in d 0 1 > d0 2 is only about 9%. Were the 2-way and 4-way analyses consistent with the Difference Model, we would exect the d 0 FC value for the 2-way condition to be greater than that for the intervals in the 4-way condition (Claims 3 and 4). Because the d 0 2 values (second interval) are about 9% less than the d 0 1 values (first interval), this factor will not be 2

Y. Yeshurun et al. / Vision Research 48 (2008) 1837 1851 1845 Fig. 7. A Plot of Performance in the First Interval vs. the Second for the 2-way condition. The distance of each data oint from the diagonal line is an index of the difference between Percent Correct in the second interval and Percent Correct in the first interval ( 2 1 ). The radius of the symbol lotted at each oint reresents the -value of the hyothesis tests [H 0 : 1 ¼ 2 ; H 1 : 1 6¼ 2 ] when it is in the range 0.05 0.000001 (i.e., when log 10 is in the range 1.3 6 as shown in the legend). When >.05 the oint is lotted as a single dot. Again, significant interval biases were found: over a third of the observers (8/22) had marked 2-way biases. We lot the mean of the estimated d 0 1 for the 4-way condition vs. the estimated 2-way d 0 FC in two formats in Fig. 9. First, we lot estimates of sd 0 1 vs. estimates of d0 FC in Fig. 9A. If Claim 3 is correct and d 0 FC ¼ sd0 1 then we exect to find that the lotted oints fall symmetrically about the dashed diagonal line. Examination of the scatter lot indicates that this is not the case. We regressed the estimates of sd 0 1 vs. estimates of d0 FC (without intercet term). The estimate of regression sloe is 1.49 with 95%-confidence interval (1.25, 1.74). We reject Claim 3 and the Difference Model. Next, we lot estimates of d 0 1 (without the correction factor s) vs. estimates of d 0 FC in order to test Claim 4 (Fig. 9B). We regressed the estimates of d 0 1 vs. estimates of d0 FC (without intercet term). The estimate of regression sloe is 1.10 with 95%-confidence interval (0.92, 1.29). The sloe is not significantly different from 1 at the 0.05 level. We find no suort for Claim 4: d 0 FC > d0 YN. If we had not run the 4-way condition, then we could interret the fitted sloe in Fig. 9B as an estimate of s and we could solve for q ¼ d 0 2 =d0 1 ¼ 0:47. We would conclude that Claim 3 could hold if the observer is remarkably insensitive in the second interval relative to the first: d 0 2 ¼ 0:47d0 1. But we did measure s searately in the 4-way condition, it is not 0.47 but rather 0.908, and we reject this ossibility and Claim 3. These results are inconsistent with the Difference Model for 2-IFC data even if we allow for the ossibility of differential sensitivity in the two intervals (failures of Claim 2). Finally, to ensure that the lack of significant differences between the d 0 values of the 2-way and 4-way conditions was not due to ractice effects, we comared erformance in the first and second halves of the 2-way condition. If erformance could still be imroved with ractice when the observers finished the 2- way condition and started the 4-way condition there should be evidence of this imrovement when we comare erformance in the first half of the 2-way condition to that in the second half. We found no significant differences in erformance between the two halves (F < 1). In fact, ercent correct in the two halves was almost identical (1st half: 84.9%; 2nd half: 84.3%). To summarize the findings of this exeriment: (a) considerable interval biases were found with the 2-way condition (corresonding to the tyical 2-IFC task). These findings are consistent with revious reorts of biases and with the data sets re-analyzed above, and together they question the claim that the 2-IFC aradigm discourage biases (Claim 1); (b) no suort was found for A5 B 5 4 3 τ d 1 d1 2 1 DifferenceModel 0 0 1 2 3 4 5 d FC 4 3 2 1 0 0 1 2 3 4 5 d FC Fig. 8. A Plot of d 0 in the Second Interval vs. the First for the 4-way condition. A least-square regression of d 0 in Interval 2 on d 0 in Interval 1 (without intercet) results in a sloe of 0.908 (solid line) which is significantly different from 1 ( <.05). The 95%-confidence interval for the sloe is (0.844, 0.973). The observers are, overall, slightly more sensitive in the first interval than the second. but instead s ¼ ffiffi 1 þ q 2 where ^q ¼ 0:908 is the ratio of d 0 values just estimated (Wickens, 2002,. 100). We derived this rule earlier in discussing the otimal Bayesian observer. Thus, ^s =1.35, slightly less than 2. Fig. 9. (A) The mean d 0 estimates for the 4-way condition multilied by s vs. the corresonding d 0 estimates for the 2-way condition for each observer. The multilier s is based on the actual measured sensitivity of observers (Fig. 8 and text) and would be 2 ¼ 1:414 only if the observers sensitivity in the two intervals were the same. Here, it is slightly smaller, s ¼ 1:35. If the Difference Model accurately described these observers, we exect the oints to be distributed around the diagonal dashed line d 0 FC ¼ sd0 YN. They are not and we reject the Difference Model. The failure of the Difference Model cannot be exlained by a hyothetical difference in sensitivity in the two intervals since we measured sensitivity in the two intervals and factored it into the rediction. The data oint corresonding to the observer in Fig. 6 is circled. (B) The same data is re-lotted without the multilier s. The diagonal dashed line now corresonds to the rediction that there is no benefit whatsoever from taking a difference of sensory signals and d 0 FC ¼ d0 YN. The data oint corresonding to the observer in Fig. 6 is circled.