Running head: SEPARATING DECISION AND ENCODING NOISE. Separating Decision and Encoding Noise in Signal Detection Tasks

Running ead: 1 Separating Decision and Encoding Noise in Signal Detection Tasks (Psycological Review, in press) Carlos Alexander Cabrera1, Zong-Lin Lu1, and Barbara Anne Doser2 Te Oio State University1 University of California, Irvine2 Autor Note Carlos Alexander Cabrera and Zong-Lin Lu, Laboratory of Brain Processes (LOBES), Center for Cognitive and Brain Sciences, Department of Psycology, Te Oio State University Barbara Anne Doser, Department of Cognitive Sciences and Institute of Matematical Beavioral Sciences, University of California, Irvine Tis researc was supported by National Institute of Healt Grant MH8118 and National Eye Institute Grant EY17491 Correspondence concerning tis article sould be addressed to Carlos Cabrera, Department of Psycology, Te Oio State University, 6 Psycology Building - 1835 Neil Avenue, Columbus, OH 4321. E-mail: cabrera.36@osu.edu. Matlab scripts related to tis researc are available for download at: ttp://lobes.osu.edu/downloads/mcr.zip

Abstract In tis paper we develop an extension to te Signal Detection Teory (SDT) framework to separately estimate internal noise arising from representational and decision processes. Our approac constrains SDT models wit decision noise by combining a multi-pass external noise paradigm wit confidence rating responses. In a simulation study we present evidence tat representation and decision noise can be separately estimated over a range of representative underlying representational and decision noise level configurations. Tese results also old across a number of decision rules and sow resilience to rule miss-specification. Te new teoretical framework is applied to a visual detection confidence-rating task wit tree and five response categories. Tis study compliments and extends te recent efforts of researcers (Benjamin, Diaz, & Wee, 29; Mueller & Weidemann, 28; Rosner & Kocanski, 29, Kellen, Klauer, & Singmann, 212) to separate and quantify underlying sources of response variability in signal detection tasks. Keywords: decision noise, internal noise, external noise, signal detection teory, doublepass, confidence rating 2

3 Separating Decision and Encoding Noise in Signal Detection Tasks Signal detection teory (SDT; Green & Swets, 1966; Peterson, Birdsall, & Fox, 1954; Tanner & Swets, 1954) remains one of te most influential models of cognitive science. Disparate areas of psycological researc ave adopted SDT as an explanatory framework for a broad range of topics including sensation and perception (Fecner, 186; Tanner & Swets, 1954), category perception (Macmillan, Kaplan, & Creelman, 1977), recognition memory (Wickelgren & Norman, 1966), attention (Lu & Doser, 1998), perceptual learning (Doser & Lu, 1998, 1999), group decision beavior (Sorkin & Dai, 1994; Sorkin, Hays, & West, 21), neuropysiology (Britten, Sadlen, Newsome, & Movson, 1992), and clinical applications (McFall & Treat, 1999). Many studies ave found application for SDT in areas far beyond traditional psycological studies (Hutcinson, 1981; McClelland, 211). Te fundamental assumptions of SDT include a representation stage and a response stage. Te representation stage assumes a noisy transformation mediating te mapping between an external stimulus and an internal response along a decision axis. Over te course of many trials, a specific stimulus elicits internal responses wit some mean level of activation (corresponding to stimulus strengt) and some variability (corresponding to te noise in te internal response), so tat te observer s internal representation takes te form of a probability density function. Stimuli of different strengts lead to probability density functions wit different means along te decision axis and potentially different variances as well. Te response stage assumes tat observers use criteria to partition te decision axis in order to map internal responses to observable decisions (Figure 1, top panel).

4 Tis relatively simple model as recently been described as one of te most successful teoretical frameworks and matematical models in psycology (Benjamin et al., 29; Kellen, Klauer, & Singmann, 212). However, results from a number of studies ave undermined some of te assumptions of SDT, most notably te assumption tat decision criteria remain fixed upon a decision axis over te sequence of trials in an experiment (Benjamin, Tullis, & Lee, 213; Mueller & Weidemann, 28; Wickelgren, 1968). An alternative possibility is tat decision criteria fluctuate from trial to trial over te course of te experiment (Figure 1, bottom panel). Evidence tat callenges te noiseless decision mecanism may appeal to a reevaluation of te principle measures of sensitivity and bias, as decision noise may modify te interpretation of tese estimates and te conclusions drawn from tem. Experimental metods capable of distinguising representation and decision noise in signal detection tasks will serve to estimate decision noise and to evaluate te impact of criterion variability on SDT parameter estimates. So far, suc metods are few and restrictive, so tat it is often impossible to know weter reevaluation is even necessary for many SDT tasks. In tis paper, we build suc a framework to separately estimate decision and representation noise components at te decision stage. We begin wit an overview of te SDT framework and a review of te empirical evidence suggesting tat decision boundaries are variable or noisy, along wit a review of recent efforts to identify and quantify decision noise in categorical judgment tasks wit at least tree stimulus classes (Rosner & Kocanski, 29). We ten develop a new framework tat combines a decision noise model for a confidence rating procedure wit a multi-pass external noise paradigm (Burgess & Colborne, 1988; Green, 1964; Lu & Doser, 28). Using simulations, we demonstrate te feasibility of parameter recovery tat estimates te separate contributions of

5 decision and representation noise for tree different decision rules. Our development applies to tasks wit only two stimulus classes over a range of possible underlying noise configurations, i.e., different relative levels of representation and criterion noise. We ten illustrate tis metod wit an application using a multi-pass visual detection experiment wit external noise. Finally, we consider some ideas for future studies as well as limitations of tis framework. Details of our experiment along wit derivations and a more formal analysis of tis framework are provided in te appendix. SDT and Static Criteria In a typical yes/no signal detection experiment, an observer monitors an observation interval for te presence of a designated signal stimulus. Te observer responds affirmatively if se believes te signal was present during tis interval. Te observer cannot respond wit perfect accuracy on every trial, sometimes correctly reporting te presence of a signal wen a signal stimulus in fact occurred, but sometimes incorrectly affirming te presence of a signal wen a signal was not present. Te it rate (HR) is te relative frequency of saying yes wen a signal is present; te false alarm rate (FAR) is te relative frequency of saying yes wen a signal is not present. Misses and correct rejections are te relative frequencies of saying no wen a signal is present and wen a signal is absent. Manipulation of te observer's 'yes' rate by canging task instruction, pay-off structure, or stimulus base rates elicits different values of HR and FAR, and te HR plotted against te FAR defines te receiver operating caracteristic (ROC, Figure 2, left; Green & Swets, 1966). Te data from empirical ROCs often comprise te fundamental features researcers wis to model in signal detection tasks. In most applications, SDT posits internal representations in

6 te form of Gaussian random variables wit mean values positioned along a decision axis and monotonically related to stimulus strengt (Graam, 1989). Consequently, te representational distributions of two stimuli of different strengt often overlap, leaving some non-zero likeliood tat a stimulus sample from eiter stimulus class (signal present or signal absent) could ave generated te internal response in a given trial. Many signal detection models assume tat te observer responds by establising a boundary or criterion along te decision axis, and cooses yes wen te value of te sampled internal representation exceeds tis criterion, and cooses no oterwise (Figure 2, rigt panels). Representations from signal present trials exceeding te criterion contribute to HR, and representations of signal absent trials exceeding te criterion contribute to FAR. Insofar as distributions of internal representations really do approximate Gaussian probability density functions, HR and FAR may be transformed into standardized scores (z-scores) to indicate te position of te criteria along te decision axis in units of te standard deviation of te underlying distributions (see Appendix A.1). Empirical zroc functions are often approximately linear, consistent wit te Gaussian distribution assumption (Macmillan & Creelman, 24). Te classical SDT model does not incorporate trial-by-trial variability in te criterion position, so all response variability accrues from variations in te internal representations of te stimuli (Benjamin et al, 29). Wile some simple SDT applications assume equal variances for signal present and signal absent distributions, researcers frequently relax tis equal variance assumption to account for te non-unity slopes often observed in many empirical zroc's. Meanwile, te static criterion assumption as rarely been relaxed. Early formulations of SDT excluded decision noise for two reasons (Tanner & Swets, 1954). First, because a static decision mecanism was optimal

7 and part of a cognitive operation, an observer would not willingly coose to vary its operation from trial to trial, since tis variable strategy would lead to lower overall performance (Benjamin et al, 213; Mueller & Weidemann, 28). And second, typical analyses of signal detection data simply could not differentiate between noise arising from representational and decision-related processes (Figure 2, rigt panels; see Wickelgren, 1968). Evidence for Criterion Variability Toug practical considerations led to omissions of criterion variability in early applications of signal detection teory, in fact, lines of evidence suggesting a variable decision process predate even te Turstonian framework (Fernberger, 192). Later, reduced performance on absolute identification due to increased stimulus range was attributed to increased variance in identification criteria (te range effect; Pollack, 1952). Early researc in auditory amplitude identification led to te explanation tat te cange in response variability arose due to subjects exibiting a range-dependent criterion noise (also interpreted as memory noise; see Durlac & Braida, 1969). Later researc suggested an independence between te range effect and te total number of response categories (Braida & Durlac, 1972) and specifically implicated te criterial range as te source of te performance decrement (Gravetter & Lockead, 1973), toug not to te exclusion of representation-related mecanisms as well (Luce, Nosofsky, Green, & Smit, 1982; Luce & Nosofsky, 1984; Nosofsky, 1983). Additionally, investigators ave invoked criterion noise to elp explain anomalies in te sape of te ROC curve (Murray, Bennett, & Sekuler, 22; Mueller & Weidemann, 28; Wickelgren, 1968); discrepancies in distributionfree estimates of response bias in confidence rating tasks (Mueller & Weidemann, 28); performance decrements related to larger rating scales in confidence ratings tasks (Benjamin et

8 al, 213); and feedback-associated manipulation (Carterette, 1966) and learning (Friedman, Caterette, Nakatani, & Aumada, 1968) in auditory amplitude detection. Oters ave suggested tat decision noise results from criterion-setting mecanisms for reconstructing stimulus representations at te decision level (Parks, 1966); and tat criterion noise is related to nonoptimal criterion sifting (Tomas, 1973,1975). For a more extensive review, see Benjamin et al (29). Altoug we ave presented a small sample ere, evidence arising from tese disparate researc areas as generated a great body of literature implicating te presence of criterion variability. Along wit tese empirical results, a literature of teoretical contributions as also emerged (e.g., Kac, 1962; Treisman, 1984; Treisman & Williams, 1985). Strictly speaking, to watever extent quantitative models can account for te penomena of criteria sifting, we can no longer refer to tis as noise in te proper sense of te word. We ere follow earlier writers wo ave disambiguated systematic noise from unsystematic, irreducible, or random noise (Levi, Klein, & Cen, 25; Rosner & Kocanski, 29). We now turn to te researc efforts to separate and measure decision noise. Decision Noise Metods and Models Analysis of te categorical judgment task sowed tat standard signal detection experimental procedures could not generally distinguis representational noise from decision noise witout significant simplifying assumptions (Rosner & Kocanski, 29; Torgerson, 1958). Te first serious researc effort to understand te influence of decision noise began wit Wickelgren and is study of response predictions for a variety of signal detection task conditions in te presence of significant criterion noise (altoug see also Tanner, 1961, for consideration of

9 decision noise under a less rigid interpretation of decision criterion in a 2-alternative forced coice task). In a seminal paper, Wickelgren (1968) examined te ramifications of decision noise for subject performance in yes/no and confidence rating tasks. He derived functional forms for te zroc and sowed tat observers wit non-trivial decision noise could produce linear zrocs as long as decision noise remained constant across criteria and task structure did not alter representational caracteristics (see also Benjamin et al., 29). Static criteria wit Gaussian representational distributions lead to linear zrocs, but linear zrocs do not necessarily imply static criteria. Wickelgren also considered te implications of attenuated criterion noise at a primary decision boundary relative to te remaining criterion boundaries in bipolar confidence rating tasks and te data signature tis affords in a zroc curve (see also Mueller & Weidemann, 28; Murray et al, 22). In particular, e observed tat te subject could exibit a peaked zroc wen criterion noise at te primary decision boundary is significantly less tan te decision noise at te remaining boundaries. Reviewing studies wit greater numbers of category boundaries, e often identified larger peaks, leading to te speculation tat increasing te number of category boundaries could increase decision noise. Tis finding was consistent wit Miller's famous paper on information retrieval (Miller, 1956) and te criterial range interpretation of te range effect (Gravetter & Lockead, 1973) insofar as additional criteria lead to broader criterion spread across te decision axis. Wickelgren's close examination of te sape of subjects' ROCs and zrocs became a standard diagnostic approac for criterion variability in signal detection type tasks. But because data collection in typical yes/no tasks requires bias manipulations tat migt alter eiter representational or decision processes, researcers preferred confidence rating procedures for

1 teir greater assurances of representation and decision noise stability over te duration of te experiment. However, even studies using rating procedures may ave fallen sort of unambiguous estimates of representation and decision variability owing to tradeoffs between tese parameters in estimation (e.g., Mueller & Weidemann, 28; Benjamin et al., 29). Nosofsky (1983) developed a multiple presentation metod to examine te range effect wit an identification task. On individual trials in is study, subjects made multiple responses to repeated identical presentations of a stimulus from one of te available stimulus classes. Altoug e treated eac response as independent of te oters, e assumed tat noisy internal representations were averaged wile decision noise remained constant across presentation repetitions. By separately measuring sensitivity for eac presentation repetition, e demonstrated non-trivial decision and representational noise wit bot components increasing wit larger criterion range. Benjamin et al (29) developed an Ensemble Recognition task similar to te multiple presentation metod of Nosofsky to examine te effects of decision noise in memory recognition. In tis study, subjects were first presented a study list of words tey would later be asked to recognize during a test pase. During te test pase individual trials contained ensembles of one, two, or four words. Eac ensemble contained eiter one, two, four, or no words from te previously examined study list. Te Ensemble Recognition framework assumed tat eac word of eac trial ensemble led to internal activations independent of te oter words, and tat eiter te sum or te average of tese activations would comprise te internal representation at te decision stage. Similar to Nosofsky, tese autors assume tat te decision noise remained constant wile te summing or averaging would lead to adding or averaging of

11 te representational noise. Te averaging model performed best in model selection tests and estimated a very significant role for decision noise in word recognition. More recently, Kellen et al (212) offered a critique of te conclusions drawn from te Ensemble Recognition study and provided new reports on te question of decision noise in memory recognition using a model generalization framework. Tis approac involves combining a 4-alternative forced coice task wit a rating procedure under te traditional assumptions tat internal representations are identical under te two regimes and tat response bias does not play a role in subject response during forced coice tasks. Tey jointly fit teir elaborated SDT model wit decision noise to data from bot te 4AFC and te confidence rating tasks but found virtually no significant decision noise influencing subject performance in teir memory recognition experiments. Rosner and Kocanski (RK; 29) developed a categorical judgment model to separately estimate criterion noise at decision boundaries. Tey corrected an error in an earlier formal description of a categorization task tat allowed for decision noise in absolute identification and confidence rating tasks (Torgerson, 1958). However, RK sowed tat te earlier formulation failed to account for te fact tat truly independent noisy criteria migt overlap from trial to trial and could result in predictions of negative response frequencies. Teir revised formalization accounts for tis overlap and can be reduced to two special cases: in te absence of decision noise te model simplifies to te traditional SDT model, and in te absence of representation noise te model simplifies to a complimentary SDT model (a formulation wic ascribes all response variability to noisy criteria). Using simulated experiments, RK sowed parameter recovery was possible for a range of assumed parameter configurations. Tey argued tat te

12 general formulation of te model disambiguated te conflated parameters, and tat acquiring sufficient degrees of freedom in data posed te only constraint to parameter estimation. In particular, a categorization task wit N stimulus classes and M+1 response categories requires identification of te means and variances of 2N-2 stimulus parameters (assuming a reference stimulus class wit mean and variance 1) and 2M criterion parameters. Tis categorization task as NM independent data points, so tat full model identification is possible only wen NM > 2(N+M)-2; tat is, wen bot N >2 and M >2. For te standard signal detection paradigm wit 2 stimulus classes (N = 2), a solution is available only if te criterion variances are assumed equal at all category boundaries. A New Approac Intuitions and Rationale We develop a framework combining two well-known experimental paradigms to estimate bot representational and decision noise components in signal detection type tasks wit only two stimulus classes, S and S1 (were refers to signal absent trials and 1 refers to signal present trials). Te first paradigm is a confidence-rating task in wic subjects provide a rating Ri indicating teir degree of certainty tat te present trial contains a signal stimulus (Egan, Sulman, & Greenberg, 1959). Te second component is te multi-pass procedure, an external noise paradigm involving multiple presentations of identical stimuli (Burgess & Colborne, 1988; Greene, 1964; Lu & Doser, 28). We sow tat tis combination sufficiently constrains elaborated signal detection models by providing measures of agreement in addition to rating frequencies. Here we offer some basic intuitions to illustrate our strategy for dissociating

13 representation and decision noise components. To begin wit, we simplify our exposition by considering response variability wit a single criterion C wit stimulus class S, were = or 1. If an observer responds differently to two or more trial presentations wit identical stimuli, we attribute te cange in response to internal noise. Researcers ave explored tis basic idea by adding external noise to stimulus presentations in order to estimate internal noise (Barlow, 1957; Pelli, 199; Lu & Doser, 1998, 28). Examples of external noise include random assignment of contrast increments or decrements to individual pixels in a visual stimulus, samples of wite noise added to an auditory stimulus, or any oter random trial-by-trial perturbations to te stimulus. Multiple presentation metods tat utilize external noise assume tat te total noise degrading subject performance is a composite of component noise sources. Te first component, wit standard deviation σext, reflects a variability in te subject's internal representation of te external noise tat is entirely correlated wit te variability in te pysical stimuli. Tis assumption implies tat identical samples of external noise lead to internal representations tat are partly composed of identical offsets along te decision axis. Terefore, a given sample offset reflected by tis consistent noise component depends entirely on te specific noisy stimulus tat evoked it1. Te second component, wit standard deviation σ E, signifies te internal noise induced during trials of stimulus class and reflects random perturbations arising from te encoding of bot signal (if present) and external noise in trial stimuli. Finally, random trial-bytrial sampling of a variable criterion wit standard deviation σc constitutes a tird component. Te distributional parameters of te encoding noise component may be functionally related to 1 To our knowledge, filtered or bandpass noise as not generally been used wit te mutli-pass paradigm. However, color or frequency spectrum notwitstanding, we see no difference in te principle assumption tat trial sampled internal noise is comprised of stimulus dependent (consistent) and stimulus independent (random) components.

14 features of a stimulus class (e.g., contrast level), but it is still stocastic in nature and results in random perturbations of te internal representation to identical stimuli. Te criterion variability, by assumption, neiter depends on individual stimulus samples nor on te general stimulus class. We refer to tese secondary noise components as random noise (Levi & Klein, 23) insofar as tey operate independently of any external noise samples (drawn from a single distribution). Terefore, te total response variability σ T during trial presentations of stimulus S, is te combined result of te perturbations arising from consistent and random noise components. 2 2 2 2 σ T = σext +σ E +σc (1) In a multi-pass paradigm, subjects perform a signal detection task over multiple passes of trials. Eac trial from te first pass includes an independent sample of external noise. However, subsequent passes of trials contain te same stimuli and exactly identical samples of external noise as in te first pass (Figure 3). Altoug two passes suffice to obtain an estimate of agreement, in practice experiments often include additional passes for better accuracy and precision. Since any cange in overt response to identical presentations of a stimulus reflects a cange in te internal state of te observer, variability in response to identical stimuli reflects internal noise (Burgess & Colborne, 1988; Green, 1964; Lu & Doser, 28). Researcers can assess to wat extent subject responses agree over multiple presentations of identical samples of noisy stimuli and tis agreement can be used as an additional constraint to determine te ratio (σ 2E +σ 2C )1/ 2 / σext (see Appendix). Low ratios of internal to external noise will lead to greater agreement between responses to identical stimuli, wile iger ratios lead to a decline in agreement. Te estimated statistic of agreement depends on te task specifications but can be

15 measured wit percent agreement (Burgess & Colborne, 1988; Spiegel & Green, 1981; Lu & Doser, 28), correlation (Levi & Klein, 23), or covariance between responses to corresponding trials on successive passes. For multi-pass experiments involving only a single decision criterion, te observed response frequency and response agreement can provide estimates of te total internal to external noise ratio in addition to sensitivity and response bias (Green, 1964; Burgess & Colborne, 1988). Te separate parameters of criterion and encoding variance, owever, leaves many possible combinations of criterion and encoding noise tat are compatible wit te measured combination of HR, FAR, and agreement measures. In a multi-pass signal detection experiment wit a single criterion, tere are five parameters to estimate (encoding noise for eac stimulus class, a mean value for te signal distribution, a criterion mean, and a criterion variance) wit only four data points (HR, FAR, agreement on signal present trials, and agreement on signal absent trials). Degrees of freedom increase wit additional criteria in a rating experiment. Rosner & Kocanski (29) demonstrated te possibility of independent estimates of criteria variability, criteria positioning, stimulus positioning, and stimulus representational noise (tey did not distinguis between consistent and random components) in rating tasks wit at least tree stimulus levels and four response categories. Estimating tese parameters wit only two stimulus classes, owever, requires additional constraining data measurements. In tis paper, we use a multi-pass confidence rating procedure (MCR) and we measure te covariance of responses to trials of a specific stimulus class across different passes as an index of response correlation between tese passes. Te full covariance matrix provides a compact summary of agreement measures for te same categorization of identical trials across passes (witin-category covariance

16 along te diagonal) as well as disagreement for different categorizations of identical trials (between category covariance off te diagonal). Conceptually, if trial-by-trial responses over eac pass are taken as vector elements, ten te covariance gives te (mean adjusted) dot product of tese response vectors. A igly positive covariance estimate implies response agreement across passes. Very low covariance (near zero) implies lack of agreement. Higly negative covariance implies not only lack of agreement but strong disagreement across passes. Wit low to moderate levels of internal noise, we intuitively expect positive covariance values for witincategory estimates along te diagonal of te covariance matrix. For between-category covariance estimates for adjacent regions of decision space (e.g., response assignments of 2 and 3 across passes) we migt expect lower toug still positive values. For between-category covariance estimates for response assignments of nonadjacent regions (e.g., response assignments of 2 and 5 across passes), we expect nearly zero or negative covariance estimates. Here we sow tat te MCR procedure sufficiently constrains a class of decision noise models to identify all relevant parameters even wen te task involves only two stimulus classes. Under te MCR procedure, eac stimulus class gives us M independent response frequencies as well as M independent agreement measures for identical responses between passes. In addition to te covariance of responses for te same rating category across passes (witin-category covariance: e.g., response category 2 in te first pass and 2 again in subsequent passes), te covariance of responses for different rating categories across passes may provide even stronger constraints for model fits to data (between-category covariance: e.g., response category 2 in te first pass and 3 in subsequent passes). In total, te MCR provides M(M+3) data points (2M

17 response frequencies and M(M+1) covariance estimates) to fit 2M+3 free parameters: M criterion positions, M criterion variances, an encoding variance for te signal absent trials, an encoding variance for te signal present trials, and te mean position of te signal stimulus along te decision axis (Table 1). Terefore, te MCR procedure may provide sufficient constraints to recover all decision noise parameters for a rating task wit as few as tree response categories (corresponding to M = 2). To illustrate tis point, Figure 4 (left) sows two overlapping and nearly identical ROCs generated using very different underlying internal noise components. In one case, te encoding noise is equal for signal-absent and signal-present trials wile decision noise is small for all criteria. In te second case, te encoding noise for signal-present trials is alf tat for signal absent trials, wile te decision noise varies markedly across criteria and even well exceeds te encoding noise at one of te decision boundaries. Yet, in spite of tese very different noise profiles, te resulting ROC's are essentially te same. On te oter and, te covariance measures estimated from an MCR procedure are drastically different (Figure 4, rigt) and may provide additional constraints to disambiguate te underlying noise components. Wile a greater number of independent data points relative to te number of free parameters provides a necessary condition for fitting tose parameters witin te context of a model, tis is not sufficient all on its own (Busemeyer & Diederic, 21). Even wit more data points relative to free parameters, te data may fail to fully constrain te model and disambiguate te parameters, so tat successful model identification depends on more tan degrees of freedom alone. We will provide evidence tat te MCR framework allows for full parameter recovery

18 from simulated data over a wide range of conditions. However, we first seek an intuitive demonstration of te relationsip between observed data and underlying noise components. Wile some canges to covariance data are straigtforward (e.g., representational noise for a specific stimulus class selectively depresses covariance estimates for responses to tat specific stimulus class, but nontrivial decision noise at even a single criterion boundary will lead to canges in covariance and z-scores at all criteria owing to positional overlap), te pattern of expected values becomes more complex wit te introduction of decision noise. In Figure 5, we examined canges to expected values of response frequencies and covariance structure for a tree-category rating task in wic we selectively increase te variability for one of te criteria from zero to matc te level of variability in te stimulus representation. For tis very simple example, we assumed tat observers map internal representations to responses according to a corrected Law of Categorical Judgment as described by Rosner and Kocanski (29; see Decision Rules below). Tis decision rule determines response assignment by subtracting eac trial-sampled representation from trial-sampled criteria and coosing te category were te difference between representation and corresponding criterion gives te least positive value; wen all values are negative, te representation is assigned to te igest response category. We begin from te standard SDT account wit no decision noise. In tis case we assume tat two static criteria, eac positioned at te mean of te signal-absent and signal present distributions, divide te decision space into tree response categories (Figure 5, top-left). Our example assumes a d' = 1 wit equal representational noise for te two evidence distributions. In contrast, we juxtapose a second scenario in wic we selectively increase te decision noise for te more lax criteria to matc te representational noise, witout modifying any of te oter

19 parameters. Te joint distributions accounting for bot te variability in te criterion as well as variability in te signal-absent and signal-present representations are sown as concentric circles (Figure 5, left middle and bottom). Te vertical axis represents positions of te noisy criterion, te orizontal axis reflects positions of te noisy internal representations, and te solid blue lines reflect te position of te means of te noisy and static criteria wit respect to te noisy criterion (orizontal blue lines) and representational (vertical blue lines) distributions. Finally, we superimpose rating response column and row labels A, B, C, and D for regions of te joint distributions according to te decision rule described above. For example, wen trial samples of bot te noisy criterion and representation exceed te stricter (and static) criterion in region DD, some trial representations will be classified as 1 s instead of 3 depending on weter te sampled criterion exceeds te sampled representation. Similarly, trial representations will always be classified wit a response category of 2 anytime a sampled criterion exceeds te static criterion wile te sampled representation does not (regions AD, BD, and CD). Eac column of tese joint distributions illustrates ow some representations falling along te decision axis become reassigned depending on te position of te trial sampled criterion. In column C, for example, all representations remain wit a response assignment of 2 except in row C were some will be reassigned to a response of 1. Figure 5 (rigt) also sows te corresponding canges to te zroc and covariance in te classical SDT treatment wit no decision noise (sown as circles) and wit te targeted increase in decision noise at te most lax criteria (sown as '+' symbols). In te case of te zroc plot, we can see ow te introduction of decision noise at te more lax criterion results in small but noticeable cange in position for te stricter criterion in z-space. Column D in te joint

2 distributions sows tat response assignments of 3 can only decrease wit increased decision noise at te more lax criterion, and no responses previously mapped to 1 or 2 will be reassigned to 3 according to te parameters we ave cosen for tis illustration. Tis net loss of assignments to 3 occurs for bot signal-absent and signal-present trials and is reflected by a sift in te criterion estimate in te zroc towards te bottom left. Similarly, columns A and B sow ow te criterion variability on signal-absent trials results in a net decrease of response assignments mapped to 1 leading to a significant rigtward sift in te more lax criterion estimate in zroc space: losses from region BB are canceled by gains in region CC, but region AA, BA, AD, and BD all lose response assignments of 1 witout corresponding counterbalancing regions. Tese regional reassignments are also true for signal-present trials, but in tis case te region CC represents a muc iger likeliood under te joint density function tan is counterbalanced by regions AA, BA, AD, and BD. Tese regional excanges, coupled wit an additional increase in 1 responses from region DD to counterbalance losses in region BB, results in a very sligt net increase in response assignments of 1 wit a corresponding subtle downward sift in te position of te more lax criterion in te zroc plot. We can also observe tis increased decision noise canges te covariance data, toug overall response frequency will also affect tis measure in addition to te correlation in responses across passes. For bot signal-absent and signal-present trials, te covariances for response assignments of 3 decrease due to canges in lower correlations and lower response frequencies wen trial samples of bot criterion and representation fall witin region DD. Witin-category covariance for response assignments of 2 also decrease wit increased decision noise for signal-absent trials since many of te regions previously assigned to 1

21 become remapped to 2 under te joint distribution. Altoug te remapping of tese regions also occurs during signal-present trials, covariance for response assignments of 2 nets a small increases ere because te overall response frequency increases wit decision noise, but te sifted position of te signal-present joint distribution leads to a lower drop in correlation tan occurs in signal-absent trials (note te lower impact of regions AD, BA, BB, and BB). On te oter and, te between-category covariance of responses 2 and 3 become increasingly negative on bot signal-absent and signal-present trials. Tese negative covariances occur because response assignments of bot 2 and 3 become increasingly associated wit 1 on subsequent passes, tereby decreasing te 2-3 covariance from baseline. Decision Rules For any task amenable to analysis witin te signal detection framework, SDT assumes observers generate responses by comparing internal representations of te trial stimulus wit one or more decision criterion. A decision rule constitutes a specific protocol tat determines ow an observer assigns an internal representation to a response. Wit static criteria, most straigtforward decision rules predict identical responses for any given trial-sampled representation. Wit noisy criteria, te situation may be quite complex. Wen te task involves only a single noisy criterion (yes/no, 2AFC, 2IFC wit bias, etc), no ambiguity arises in consideration of tis comparison. Similarly, for tasks calling for multiple criteria (rating procedures, identification, classification, etc), it is straigtforward to map a trial-sampled representation to response as long as te noisy criteria do not overlap from trial to trial. We migt even expect te operation of an enforcement mecanism maintaining ordinal relations between trial-sampled criteria (Treisman & Faulkner, 1984).

22 Wen noisy criteria ave overlapping distributions, trial-sampled criteria may sometimes become disordered along te axis, requiring subjects to implement a more complicated decision rule. Simultaneous decision rules require te observers to compare te internal representation wit available criteria all at once. Tese decision rules ten determine a response category by making a unique selection among te results of tese comparisons. Te work in tis paper focuses on several forms of simultaneous decision rules. We first formulate te simultaneous decision rule used by RK: subtract te position of te stimulus representation from eac criterion boundary and respond wit te category affording te least positive distance; if all differences are negative respond wit category M+1. Following a similar notation used by RK, let s G(, 1) were G(μ, σ) is a Gaussian random variable wit mean μ and variance σ2. Ten s σ E equals te random offset of te internal response from its mean position μ S due to te subject s encoding noise during a trial of stimulus class S. Also, let c i G (,1) and c i σc equal a trial-sampled offset of te it criterion from its i mean location μ C due to te subject s internal decision noise at tat boundary. We now assume i a single external noise level σext = 1, so tat all parameters are estimated in reference to tis term. We let sext equal an observer s consistent trial-by-trial offset to te internal representation due to presentation of a specific sample of Gaussian external noise, so tat s ext G(,1). Te RK decision rule just described can be formalized as follows: for a trial-sampled stimulus of class is to coose te category m wen te following equation evaluates to true, or category M+1 if te equation evaluates false for all m:

23 s ext +s σ E +μ S < c m σc +μ C < min [ c m' σ C ' +μ C ' s ext +s σ E +μ S < c m ' σ C ' +μ C ' ](2) m ' m m m m m m m Klauer & Kellen (212) proposed two alternative simultaneous decision rules. In te first of tese alternatives, te decision rule determines te trial-by-trial response according to te rule: subtract te m criterion boundaries from te trial-sampled stimulus representation and respond wit te category m+1 yielding te smallest positive distance; in te event all comparisons are negative, coose category 1. Te second rule determines te trial-by-trial response by computing te least absolute distance between criterion boundaries and te trial-sampled representation. Specifically, subtract te stimulus representation from all M criterion boundaries, identifying te smallest absolute value of te difference between stimulus representation and criterion boundary m, and coose category m if te difference is positive and m+1 oterwise. Tis second rule also as te additional consequence tat rating frequencies will be symmetrically distributed wen te corresponding means of criteria distributions are symmetrically distributed about an evidence distribution. Given any M > 1 trial sampled criteria, tese decision rules can be used to map any trial sampled internal representation to overt observer responses. To distinguis tese tree decision rules, we follow Kellen et al (212) and denote RK's Law of Categorical Judgment as LCJ (given by equation 2); we denote te second (Klauer and Kellen's complimentary version of te LCJ) as LCJc, and te last as LCJsym due to its symmetric treatment of criterial boundaries relative to trial sampled representations. Figure 6 contrasts te response mappings for eac of tese tree decision rules wen trial-sampled criteria overlap. For a given sample of criteria, te rules prescribe different response profiles for stimuli falling in a given region along te decision axis. Note tat for any given overlapping criteria te LCJ and

24 LCJc prescribe entirely incongruent responses wile LCJsym sows some response agreement wit bot. Tese differences suggest te possibility tat te LCJ will produce distinctly different data patterns in te aggregate from te LCJc rule and moderately different patterns from te LCJsym rule. Wit tese tree different decision rules in and, we examined te possibility of parameter recovery in simulated MCR experiments using simultaneous decision rules tat eiter matced or mismatced te rule used to generate simulated data. Simulation Study In te present study, we recruit te power of external noise and te MCR metod in a confidence rating task to disambiguate and estimate criterion noise under te various simultaneous decision rules LCJ, LCJc, and LCJsym. We derived te expected values of te response frequencies and covariance data conditioned on trial-by-trial samples of external noise. Here in te main text we sow te equations describing LCJ. For a formal description of LCJc and LCJsym, please see Appendix A. For te LCJ decision rule, te expected response frequencies conditioned on te external noise sample s ext for te t stimulus class are given as, P (R=m sext, S ) = μc + cm σ C m ϕ(cm ; μc m, σc ) m m ϕ ( se ; sext +μ S, σ E ) m' m [ μc + cm σc m 1 m μ S +s E σ E + s ext ] ϕ(cm ' ; μ C,σ C ) dc m' ds dcm m' m' (3) were φ(x) is te Gaussian probability density function. We ten easily determine M P (R=M +1 s ext, S ) as 1 m=1 P (R=m s ext, S ). Te first term in eq. 3 integrates over all possible values of te mt criterion. Te middle term integrates over stimulus representation values up to tat criterion. Te tird term estimates te probability tat te response is consistent

25 wit any oter criterion. We ten integrate over all external noise samples sext to get te overall response frequency for tis stimulus class. P ( R=m S ) = P( R=m s ext, S )ϕ( s ext ) ds ext (4) Similarly, across any two passes i and j, te covariance between any two response categories m and m' is, Cov [Ri =m, R j =m ' S ] = P( Ri=m s ext, S ) P( R j=m' s ext, S )ϕ( s ext ) ds ext P ( Ri =m S ) P (R j=m' S ) (5) We now sow tat data from te MCR experiment adequately constrains te models to uniquely identify individual representational and decision noise components. We approac tis problem by examining te precision, accuracy, and goodness-of-fit of recovered model parameters from simulated data. For eac decision rule adopted by our simulated observer we tested parameter recovery wen fitting simulation data wit matced models (e.g., LCJ fitted to data generated wit a simulated observer using LCJ) as well as wen fitted wit mismatced models (LCJc and LCJsym fitted to data generated wit simulated observer using LCJ). In te multi-pass framework, response frequencies and te covariances of responses across passes are estimated. Tis covariance data paired wit te rating response sufficiently specifies te models for independent identification of encoding and decision noise contributions. Metods Rationale. In order to demonstrate full parameter recovery for te model using our new framework, we simulated a number of MCR experiments under a range of noise configurations. Because MCR experiments scedule identical stimuli over eac pass, data collection may require significant empirical investment. Since te minimal data for acceptable model recovery was of

26 interest, we examined not only te possibility but also te feasibility of parameter recovery at different numbers of trials and passes per simulated experiment. Our simulations investigated several plausible configurations for te parameters of criterion and stimulus distributions using tree response categories and two stimulus classes. We focus on te minimum number of stimuli and rating categories because earlier efforts towards parameter recovery became problematic wit fewer response categories. We investigated configurations in wic eiter te criterion noise variances or te encoding noise variances were equated along te decision axis (labeled equ), increased along te decision axis (labeled asc) or decreased along te decision axis (labeled des). We assume a single external noise variance of unity for all stimulus classes, wit an external noise mean of zero. For any given variance configuration, max [σ2e, σ2e ] 1 and max [σ2c, σ2c,..., σ C2 ] 1. We also 1 1 2 M normalized te sum of te igest decision and encoding noise variances to equal te variance of te external noise. In oter words, max [σ2e, σ2e ] + max[σc2, σc2,..., σ 2C ] = σ 2ext. Tis 1 1 2 M constraint accords wit te reports of previous autors tat te total internal noise lies near tis level for visual and auditory detection and discrimination experiments over a considerable range of external noise levels2 (Burgess & Colborne, 1988; Green, 1964; Lu & Doser, 28). For all oter noise components, we computed variances by applying logaritmic decrements in te ascending and descending conditions. We positioned eac criterion mean along te decision axis at 1 3 ( σ2ext +σ2e )1/ 2 and 2 3 (σ 2ext +σ 2E )1 /2 so tat we could ensure a robust level of trial-by-trial criterion overlap. Finally, we kept te position of te mean of te signal distribution at 2 Te dependence of internal noise on external noise is predicted from observer models tat sow internal noise increases wit te total energy of te stimulus (Lu & Doser, 28).

27 2 2 1 /2 (σ ext +σ E ). Te various arrangements of parameter configurations is sown in Table 2 and Figure 7. Te simulated experiments emulated a confidence rating detection paradigm in wic an observer maintains two criteria tat define tree response categories. Te simulated observer implemented a LCJ decision rule for all noise level configurations. We also generated simulated data wit te LCJc and LCJsym decision rules for a single parameter configuration in wic decision and encoding noise are equal across criterion boundaries and stimulus classes. Te probability of a signal present stimulus was.5. Te simulated experiments varied te number of trials per pass and number of passes per experiment, in addition to a specific parameter configuration. Te number of trials n per pass was 25, 5, or 1 and te number of passes was eiter four or six. We set te minimum number of passes to four in order to obtain variance estimates on covariance data for weigted-least squares model fitting. Data analysis. Te data were arranged in tis way: for eac stimulus class, we ave M+1 subject response matrices R(m, ) of size T J, were J is te number of passes, T is te number of trials per pass, and m is an available response category. Ten eac entry of R(m, ) contains 1's for trial responses to stimulus class classified as category m and 's oterwise. Tus, we denote r (mj, ) as te jt T 1 column vector of te matrix R(m, ) wit te tt entry (m, ) rt j equal to 1 or, signifying weter or not subjects classified te stimulus from te tt trial of te jt pass wit a classification of m. Te matrix corresponding to te lowest confidence (m = 1, ) rating R was dropped due to its redundancy given te oter response rates and fixed trial numbers.

28 For every simulated experiment, we computed te relative frequency of te mt classification rating during eac pass j as p j (r =m S ) = 1 T T r(mt j, ) (6) t=1 Te average of eac response rating across all passes is te best and final estimate of te rating response rate. Tat is J p (r =m S ) = 1 p (r=m S ) J j=1 j (7) Covariance was computed for every combination of passes for every rating category. For passes i and j, were i j, and category ratings m and m', te covariance is given as, Cov [ r (m, ) i,r (m', ) j ] T 1, ), ) = [r (m p i (r =m S )][r (m' p j (r =m' S )] ti tj T 1 t =1 (8) We refer to te covariance as witin category covariance wen m=m' and between category J 1 covariance wen m m'. For an MCR experiment wit J passes, we ave j observations j=1 J 1 of witin category covariance estimates for eac response rating m, and 2 j observations of j=1 between category covariance estimates for eac response pairing of m and m'. We took te average of all pairwise estimates as our final covariance estimate between categories m and m'. Weigted least-squares model estimation requires estimates of te variance for eac of te final response rates. Te variability of te response rates for eac pass was estimated by te variance of eac response rate across all passes: J Var [ p j (r =m S ) ] = 2 1 p j ( r=m S ) p (r =m S ) ] [ J 1 j=1 (9)

29 Te final estimate of eac response rate is te average of te response rates across passes, and te final estimate of variance for an averaged response rate across all passes is given by dividing te variance among individual passes by te total number of passes. Tat is, Var [ p (r =m S )] = Var [ p j (r=m S ) ] J (1) Variances for covariance data were computed by first taking te variance of eac witin J 1 J 1 j=1 j=1 and between pass estimate and ten dividing by te j or 2 j possible pairing combinations, respectively. Modeling. We fit te LCJ, LCJc, and LCJsym to simulated data derived from eac parameter configuration and LCJ decision rule, and to simulated data derived from one parameter configuration using te LCJc and LCJsym decision rules. Model fits used a Matlab simplex optimization routine (Nelder-Mead) and a weigted least-squares cost function. Te cost function eavily penalized a possible solution if any variance parameters fell below zero or if te criterion means violated teir ordinal relation. At te beginning of eac parameter searc routine, we generated initial starting parameters by independently perturbing te true means of eac parameter using a Gaussian random number generator wit a standard deviation of.15σext. Apart from penalties just stated, te constraints imposed on parameters of te simulated observer were not imposed upon te model during parameter recovery: candidate fits of criteria and signal distribution means were not restricted to specific positions along te decision axis nor were tey restricted to maintain certain relative distances; nor were any decision and encoding noise variances constrained to sum to unity. We ran 25 experiments at eac experimental condition

3 and at eac parameter configuration. Results We computed te median and 95% confidence interval for eac model parameter using te 25 simulated runs at eac parameter configuration and pass-trial combination. In every case, te actual parameter values of te simulated observer fell witin te 95% confidence intervals of te estimated values for eac position and variance parameter. Te median parameter values recovered from te matced model were very close to te parameter values used to generate te simulated data. Tese results stand in contrast to te attempted parameter recovery for decision protocols of te models mismatced against decision rule of te observer. In te case of LCJc fitted to te data simulated wit LCJ, at least one generative parameter failed to fall witin te 95% confidence interval wen simulations were run wit four passes at 5 trials/pass or wit six passes at 25 trials/pass. Wen we fitted LCJsym to te data simulated wit LCJ, at least one generative parameter failed to fall witin te 95% confidence intervals wen simulations were run wit four passes at 5 trials/pass. We also examined te precision and accuracy of our model fits as a function of trials per pass and passes per experiment. We calculated te standard error (SE) of individual recovered parameters by computing te standard deviation of eac fitted parameter across all experiments witin a given noise configuration, trials/pass, and passes/experiment setting. Similarly, we estimated an individual parameter mean-squared error (MSE) by squaring te difference between te true parameter value adopted by te simulated observer from te corresponding fitted parameter in eac experiment and averaging across all experiments witin te given configuration, trials/pass, and passes/experiment setting. Mean SEs (averaged across all model

31 parameters), as well as te SE of te most variable parameter, strictly decrease wit increasing trials per pass and passes per experiment at eac experimental configuration (Figure 8). Mean MSEs (again, averaged across all model parameters) also exibit a pattern of increasing accuracy (decreasing MSE) wit greater numbers of trials and passes for te correctly matced decision rule (Figure 9). Te MSE of te most poorly fitted parameters (i.e., tose parameters wit te igest MSE) also decrease wit increasing trials and increasing passes (a single exception occurs in te DN-asc EN-des configuration at 5 trials/pass comparing four vs six passes per experiment). We also examined fits at six passes/experiment for mismatced relative to matced models (Figure 1). For bot fits of LCJc and LCJsym to an observer using LCJ, te averages of te MSE for mismatced protocols do not generally monotonically decrease wit trials/pass or passes/experiment. Furtermore, at six passes/experiment, fits for bot mismatced models sow a iger average MSE across all trials/experiment relative to MSE for te correctly matced model for all configurations except DN- EN-asc. Te models perform equally well for simulations assuming zero decision noise because te models make identical predictions for negligible decision noise. For one parameter configuration, we used bot LCJc and LCJsym as our simulation decision rule (Figure 1, bottom). Here too, accuracy improved for matced but not mismatced models wit increasing trials. An important concern is weter differences in parameter recovery between matced and mismatced models correspond to goodness-of-fit wen actual underlying parameters are unknown. A weigted least squares estimate (χ2) finds parameters tat minimize te difference between simulated data and expected values of data based on recovered parameters. We

32 computed χ2 for eac fit of matced and mismatced models to eac simulated data set. We averaged across simulations from a given configuration and trials/pass setting using six passes/experiment from mismatced and correctly matced models. In tis case, te average χ2 fits for te correctly matced model remains nearly constant wit increasing trials/experiment (Figure 11). On te oter and, average χ2 for mismatced models increases wit increasing trials/experiment for all configurations except DN- EN-asc. In contrast to te oter configurations, average χ2 fits for DN- EN-asc are notably consistent across bot matced and mismatced fits. For simulated observers wit zero decision noise, fits sow an increasing accuracy wile te log of te mean ci-square fits lie witin a narrow range across all trials/experiment for all model protocols. We also investigated te frequency wit wic te model fits for correctly matced model resulted in lower weigted least square costs tan fits for mismatced models. For every configuration except DN- EN-asc, χ2 fits were lower for correctly matced models tan mismatced models for at least 91% of te individual simulations wit four passes and 25 trials/pass. Tis lower bound on success rate increased to 97% for individual simulations wit six passes and 1 trials/pass. We also examined MSE and χ2 for model fits to data generated using te LCJc and LCJsym decision rules for a single parameter configuration, DN-equ EN-equ (Figure 11, bottom). Similar to results wen using te LCJ as a generative model, MSE decreased wit additional trials for correctly matced rules but did not generally sow similar decreases wit mismatced rules. Again, te χ2 results for models matced to te generative model remained low wit increasing trials, wile te χ2 increased wit increasing trials for mismatced models. Wen using LCJc as te generative decision rule, χ2 fits for correctly matced models were lower tan mismatced

33 models for at least 9% of te individual simulations wit four passes and 25 trials/pass. Tis lower bound success rate increased to 99% of individual simulations wit six passes and 1 trials/pass. However, wen using te LCJsym as te generative decision rule, success rate decreased significantly for correctly matced models relative to mismatced models at 6% of individual simulations wit four passes and 25 trials/pass, increasing to 8% wit six passes and 1 trials/pass. Discussion Previous attempts to estimate decision noise in simple response signal detection type tasks wit two stimulus classes ave required strong simplifying assumptions about te various noise components. Here we demonstrate tat an MCR procedure provides a sufficiently ric data set to effectively recover decision noise parameters in many representative parameter configurations witout assuming specific relationsips between noise components. Importantly, tis framework uses a model tat permits overlapping criterion distributions and a decision rule tat deals wit tis possible overlap. Te results sow tat bot te precision (1/SE) and te accuracy (1/MSE) of te parameters increase wit te number of trials/pass and passes/experiment. Furtermore, model fitting is not only possible, but also feasible wit a number of total trials amenable to typical experiments in psycopysical studies. For all parameter configurations, it appears tat parameter recovery does no worse and often improves wit total number of trials up to 2 total trials. However, witin te range of 3 to 4 total trials, allocating less trials over more passes results in better average accuracy tan a greater number of total trials distributed over less passes for some parameter configurations (cf, DN-asc EN-equ, and DN- EN-asc). Still, toug

34 te optimal allocation strategy may depend on te underlying parameter configuration, te accuracy generally appears to improve wit total number of trials. For te configuration assuming zero decision noise, our simulations sowed tat all tree decision models gave accurate and precise fits to te data of simulated experiments. Tis result sould come as no surprise because eac of te protocols prescribes identical trial-by-trial responses to a trial-sampled representation wen criteria remain static over te course of te experiment. However, te results for accuracy look quite different for mismatced model and simulation protocols for all configurations imposing non-trivial decision noise. In every configuration wit decision noise te accuracy and χ2 estimates are muc worse relative to correctly matced model fits. In tese cases, te accuracy generally fails to improve in any significant way wit increasing trials/pass or passes/experiment and te χ2 estimates become notably worse. Te failure of tese models to fit simulated data from mismatced protocols sows tat te χ2 estimates of recovered parameters for correctly matced pairings do not result from under-constrained models. It appears tat some combinations of response frequencies and covariance data are simply not compatible wit data sets generated by certain decision protocols. Terefore fitting a decision rule model to data derived from an MCR experiment could recover erroneous estimates of te underlying parameters wen te model rule fails to matc te decision strategy of te observer. At least in some cases, owever, mismatced models can be ruled out by comparison to fits of models more closely aligned wit decision rules used by te observer. Some positive evidence exists suggesting tat te experimenter may manipulate te observer's decision strategy by instruction and task structure (Treisman & Faulkner, 1985). However, a more parsimonious approac would attempt to disambiguate potential protocols troug model

35 selection tecniques. In a related study, we investigated te possibility of trade-offs between decision and encoding variance parameters. Tat is, for a given data set of response frequencies and covariance estimates, are variances associated wit decision and encoding processes fungible? Using te LCJ decision rule, we generated expected values of response frequencies and covariance data using te same underlying parameter sets from our simulation study (Table 2) for tree response categories. We ten independently perturbed tese generative parameters using a Gaussian random number generator wit a standard deviation of.15σext. We ten used tese perturbed parameters as an initial guess in model fitting routines to assess ow canges in model parameters led to differences between expected values in te data obtained from our generative parameters. We penalized violations of criterion ordering along te decision axis, but we did not constrain our model fitting wit te same constraints imposed on our simulated observer: decision and encoding noise variances were not constrained to sum to unity. We obtained fits for 5 iterations at eac parameter configuration. Te norm of te difference between expected values resulting from te fitting routine and tose given by te true generative parameters was always greater tan zero wen te searc failed to converge on te true parameters. Tat is, we did not find any alternative model solutions tat resulted in non-zero costs. Finally, we compared te expected values of te LCJ for eac of our representative parameter settings wit tose obtained wen random numbers were given as parameter inputs to te model. Te sum of squared differences between model outputs for te representative parameter sets and model outputs for random selected parameters generally increased wit te Euclidean distance between parameter sets. Tis relationsip was not monotonic, but a general

36 trend sowed an increasing sum of squared error wit increasing distance between parameters. We ave demonstrated te feasibility of recovering estimates for decision noise as well as encoding noise witin an expanded signal detection framework for representative parameter configurations. Tese configurations imposed identical positioning of te criteria and signal distribution means, and caps on te total noise at te decision stage. Wile we do not believe tat tis circumstance poses any fundamental constraints on te application of our framework, more complex configurations migt lead to more variable parameter estimation. For example, a iger overall total internal noise relative to external noise would necessitate a greater number of total trials in order to acieve comparable levels of accuracy and precision in parameter estimates. Neverteless, te total internal noise levels assumed by our simulated observer lay well witin te range often reported in multi-pass experiments (Burgess & Colborne, 1988; Green, 1964; Lu & Doser, 28). Wile simulation studies cannot guarantee tat te parameters of te decision noise models considered ere uniquely map to confidence rating and covariance estimates, we believe te demonstrations given ere provide strong evidence for te efficacy of te procedure in resolving and identifying factors underlying response variability. Application We applied our framework to a simple visual detection confidence rating experiment in order to assess te degree to wic decision noise contributes to response variability, and to investigate te dependence of noise components on te response structure of te task. We conducted a multi-pass, Gabor detection experiment wit external noise in foveal vision (see appendix for additional details). Subjects performed in sessions wit bot tree and five rating categories eac day. For eac subject and for eac rating scale, we collected response frequencies

37 and covariance estimates for signal absent and signal present trials across five days. We cumulatively summed response frequencies wit traditional zroc plots and also plotted bot witin- and between-category covariance estimates for signal present and signal absent trials. We found te best fitting lines for zrocs (fit to bot coordinates) estimated wit yes rates for experiments wit tree response categories fell above te best fitting line of te zroc determined by yes rates for experiments wit five response categories for bot subjects (Figure 12). Tis result is consistent wit te prediction of Benjamin et al (213) tat more response categories are associated wit more decision noise. We ten fit our data wit eac of te tree decision noise models (LCJ, LCJc, and LCJsym) and te classical signal detection model witout decision noise csdt. Our criteria for model selection among tose wit equal number of parameters (i.e., LCJ, LCJc, and LCJsym) was simply to coose te model wit te lowest weigted least-squares cost function. For selection between tese more complex models and te simpler reduced model csdt, we used F-tests for nested models (Wonnacott & Wonnacott, 1981). For bot subjects, te decision noise models did not fit te data significantly better tan te csdt model witout decision noise wen subjects used only tree response categories. Wit five response categories, te LCJ decision noise model fit te data better tan te LCJc or LCJsym models, and it provided significantly better fits tan te csdt model for bot subjects. For furter verification of te LCJ model fits to data wit five response categories, we randomly sampled, analyzed, and modeled subject responses to 8% of trial stimuli, and ten computed te r2 between predictions of te model wit tese parameters and te remaining 2% of te data. Repeating tis procedure for over 1 repeated samples, we found median rroc =.99 in zroc data and rcov =.82 in covariance estimates for subject CC, and median rroc =.97 in zroc data

38 and rcov =.87 in covariance estimates for subject YZ. We also examined weter representational parameters at te decision stage remained constant across tree and five response categories. We fit LCJ to subject data from te fivecategory rating experiment wile jointly fitting te csdt to te tree category rating experiment. We eiter allowed all parameters to vary freely, or assumed tat te represention-related parameters σ E, σ E, and μ S remained identical across response structures. For bot 1 1 subjects, fits using te representation-constrained model were statistically equivalent to te unconstrained model suggesting stationary representational distributions but decision noise increasing wit te number of response categories. Tese preliminary findings suggest tat decision noise may play a larger role in task processing wen tasks require a large number of response categories. Of course, oter SDT models migt be generating te observed data patterns for example te data may be generated by a mixture model in wic a sample representation from a signal-present trial may derive from one of two underlying distributions (DeCarlo, 22). Wen a trial is well attended, te trial representation is sampled from a distribution wit mean μ S 2 1 2. and variance σ ext +σ E If owever te trial occurred during a lapse of attention ten te trial 2 2. representation is sampled from a distribution wit mean and variance σ ext +σ E A mixture parameter λ determines te base rates for attended and unattended signal present trials. Relative to LCJ model, te mixture model also provided very good fits to te data, but te parameter λ canged inconsistently from tree to five response categories for eac subject. Cross-validation results from te mixture model and tose obtained wit te LCJ decision model resulted in very

39 similar performance outcomes so we are unable to distinguis between tese wit our experimental data (see appendix for details). Neverteless, te success of te mixture model to account for te data patterns in an MCR experiment raises te question of weter te decision noise models migt miscaracterize response variability generated from attentional lapses as variability arising from decision mecanisms. We carried out a preliminary study by fitting decision noise models to simulated data from a mixture distribution. Despite assuming a 5% lapse rate well witin te typical range assumed in attentional lapse studies, te decision noise models did not mis-attribute attentional lapse to a decision noise mecanism (see appendix for furter details). Tese results suggest tat te decision noise estimates from te decision noise models considered ere are not mistakenly conflating decision noise wit lapses in attention as an alternative mecanism of response variability. General Discussion In tis paper, we present a new framework for understanding performance in signal detection tasks tat combines rating responses wit multi-pass measurements. Te framework resolves response variability arising from representation and decision processes, and can be applied to tasks wit only two stimulus classes. Combined use of rating responses and multi-pass procedures provide stronger constraints on parameter estimation in extended SDT models wit decision noise. A multi-pass procedure allows for a measure of total internal noise relative to consistent noise, but tis tecnique by itself cannot acieve any furter resolution of noise beyond tis first order partitioning. A rating response task wit more tan two stimulus classes may provide for separate estimates of decision and representation noise, but te efficacy

4 of tis approac does not extend to experiments wit only two stimulus classes witout significantly simplifying assumptions about te underlying noise levels. Our combination of tese two approaces provides a set of observations ric enoug to separate and measure contributions of noise components at te decision stage. Te MCR procedure can be used wenever meaningful external noise manipulations can be defined for te stimulus set (see below). We demonstrated te efficacy of our framework by simulating MCR experiments for observers wit a number of underlying noise configurations. We modeled te data from eac of tese experiments and found tat precision and accuracy of parameter fits improved by increasing te number of trials and passes. For eac tested configuration, we found tese measures improved wen averaged over all parameters as well as wen considering only te worst performing parameters. Tat eac of tese improvements depended on te number of trials and passes gives us strong evidence tat response frequencies and response agreement estimates togeter constrained te extended SDT model wit decision noise. Importantly, models wit mismatced decision rules generally provided worse χ2 wit worsening results as te number of trials increased. Tis suggests tat te framework is robust to model miss-specification and tat metods of model selection could elp identify underlying decision rules in addition to model parameters. We also deployed tis framework in a visual detection confidence-rating task wit multiple passes. MCR procedures afforded estimates of response agreement in addition to response frequencies. For bot subjects, te data were better explained by an extended SDT model wit decision noise for tasks wit five response categories. Wen only using tree

41 response categories, te decision noise model did not provide significantly better fits tan te classical SDT model witout decision noise. For many applications of SDT in wic subjects may respond wit a limited number of alternative categories, our result suggests te static criterion assumption of classical SDT remains valid and useful. However, ROCs for our subjects included features consistent wit decision noise like peaked midpoints and lower performing operating caracteristics for five but not tree response categories. Wen a task structure offers a larger number of response categories, decision noise may become an important determinant in trial-by-trial response outcomes. Of course, te models we use to interpret our data affect wat kinds of conclusions we may draw, and te classical signal detection model can be elaborated in a number of ways. A mixture model wit static criteria (DeCarlo, 22) provided very good fits as well wen applied to our data. Moreover, te assumption of a latent distribution in te mixture model seems no less plausible tan te assumption of fluctuating criteria in decision noise models. It may be te case tat te decision noise models considered ere misattribute an underlying latent distribution to greater variability in te criteria. To test tis consideration we ran 25 additional simulated experiments of an MCR procedure to emulate an observer wit static criteria. We assumed equal variance for signal-absent and signal-present distributions, sensitivity (d') equal to one, and a 5% rate of attention lapses modeled by sampling from a latent signal-present distribution wit mean of zero. We simulated six passes of 5 trials eac to matc our experimental procedure and ten fit tese simulated data sets wit eac of our decision noise models. Te median model fits sowed recovered parameters quite close to te actual generative parameters used in te simulations (Table 7). In particular, median fits for criterion variances were very nearly zero and median estimates of te positions of tese criteria

42 only sligtly underestimated te true locations along te decision axis. Te median fits for encoding parameters also closely matced te underlying generative parameters, altoug in tis case te solutions converged wit considerable variability and sometimes resulted in entirely unrealistic parameter values. Distinguising between elaborated SDT models positing alternative mecanisms will require future experimental work and te developments presented in tis paper allow for te consideration of explanations involving decision noise tat were not previously available. Key features of ROC and zroc data do not depend on te static criterion assumption and in some cases contradict it. In te case of rating procedures, our framework now provides a way to identify and quantify te separate contributions of encoding and decision noise to tese features. For example, some researcers ave noted tat te peaks in empirical zrocs could emerge wit igly stable central criteria and igly variable criterion boundaries at more extreme positions (Mueller & Weidemann, 28; Wickelgren, 1968). In te current study, one subject exibited a peaked zroc and our model fits verified tis prediction quantitatively. Te framework introduced ere may sed ligt on oter anomalies observed in zroc data as well. Previous work as argued tat decision noise is induced in rating tasks wen task instructions require subjects to use te rating categories wit equal frequency (Murray et al, 22) or, more generally, wen task instructions alter criterion placement from default positions tat subjects would use absent any instruction (Kellen et al, 212; Wixted & Gaitan, 22). Tese autors suggest tat decision noise emerges from te conflict between subject's pre-conditioned preferences acquired over extensive lifetime experience, and instructions tat bias subjects to adopt criterion positions conflicting wit tese default preferences. Te subjects in our study ad

43 extensive practice in psycopysics experiments, so we expect tat default preferences were moderated. Moreover, wile we asked subjects to utilize te full scale, we did not request tat subjects use eac response category wit equal frequency. Still, we remain agnostic as to weter decision noise results from conflicts between response instruction and predisposition, or weter tis arises because of limitations on te resolution of a representation-response mapping, or for any oter reason. Te metod we propose ere may prove useful in determining te degree to wic response instruction and subject expertise influence response variability. Ours is not te first attempt to resolve decision and representational processes in signal detection tasks. For example, Wickelgren (1968) proposed a Criterion Operating Caracteristic tat allowed for comparison of te variances of criteria adopted across different signal strengts. Te metod's validity, owever, assumes equal noise standard deviations for all signal strengts. An alternative framework as been developed to separate decision and representational noise in te domain of perception wit te Decision Noise Model (DNM) of Mueller and Weidemann (28). In memory recognition, Benjamin et al. (29) developed an Ensemble Recognition task in wic participants gave confidence ratings on weter stimulus ensembles of a variable number of words were previously observed on a study list. Tese autors compared fits from a number of models and reaced te conclusion tat decision noise played a significant role in subject performance. However, Kellen et al (212) introduced teir own model generalization approac for memory recognition tat interleaved trials of a 4AFC-ranking task wit tose of a confidence rating procedure. Tese autors found no evidence of decision noise in teir study and offered a critique of te conclusions drawn by Benjamin et al. Te merits and sortcomings of eac of tese frameworks are discussed in detail in Kellen et al (212) and Benjamin (213).

44 In our view, bot te Ensemble Recognition and te model generalization approac advance our understanding of response variability considerably, altoug tey reac contradictory conclusions about te significance of decision noise in confidence rating tasks for recognition memory. One potential limitation wit bot of tese approaces is te strong constraints imposed between different noise components. Te Ensemble Recognition paradigm assumes tat a single variance term applies to te noise at all criterion boundaries. Likewise, te model generalization approac assumes eiter a single variance for decision noise across all criteria (adopting te LCJ as a decision rule) or a single variance for te confidence boundaries (adopting te DNM decision rule). Our own experimental results suggest tat criterion noise may vary considerably across criterion boundaries wen decision noise is significant (see Appendix for details). Furter, te model generalization approac assumes tat representational noise is constant across forced coice and rating-response paradigms, and tat no decision bias (and by extension no decision noise) is present during te forced coice tasks. Toug Kellen et al argue tat te decision bias observed in forced coice tasks only applies wen trial stimuli are presented in sequence, te presence or absence of any suc bias is ultimately unknown and is not precluded by teir model. Bias as been sown to play a role in similar experimental paradigms tat ad previously assumed a bias free framework (Klein, 21; Yesurun, Carrasco, & Maloney, 28). If decision noise contributes to response variability in n-alternative forced coice tasks, it may appear as inflated representational noise during model fitting; tis inflated estimate of representation variability may ten incorrectly discount te effects of any decision noise in te corresponding rating task. More generally, te constraints imposed by tese models may lead to parameter estimates tat do not accurately reflect underlying processes in

45 representation and decision-making. Rosner and Kocanski's (29) LCJ model allows independent parameter estimates for variance terms at te decision stage in paradigms wit at least tree stimulus intensities and at least four response categories. Wile tis model provides a powerful new tool to understand categorical judgment, it does not apply to te frequently used signal detection task wit two stimulus classes witout introducing constraints among te noise components. Te framework presented ere fills tat gap for tasks wit at least tree response categories wile allowing independence among noise components. An essential feature of our approac requires te implementation of external noise. Researc in recognition memory as not generally implemented tis metod, but te external noise metod is not fundamentally incompatible wit investigations of iger-level cognitive processes (Lu & Doser, 28, p71). For example, Tsetsos, Cater, and User (212) used external noise to examine decision biases and preference reversals in te domain of economic value integration. Wit regard to te MCR metod in particular, owever, mnemonic representations of bot studied and unstudied items will likely cange wit te number of times stimuli are presented during test trials. However, te MCR paradigm is only one of a number of metods tat use multiple presentations to investigate levels of internal noise (Burgess & Colborne, 1988; Swets, Sipley, McKey, & Green, 1959). In particular, Nosofsky (1983) used multiple presentations witout te use of external noise in order to estimate te representation and criterion noise in an auditory identification task. Nosofsky deployed tis metod to study noise contributions to te range effect, but tis tecnique migt offer a means of determining decision noise for tasks wit only binary response alternatives. Te Ensemble Recognition task of Benjamin et al (29) in te domain of recognition memory bears some resemblance to tis

46 approac insofar as additional presentations (or larger ensemble size) of stimulus samples lead to less variability in processes underlying representation. Recent studies ave brougt to ligt te importance of a decision rule tat resolves ambiguities tat arise wit noisy criterion boundaries in signal detection tasks wit tree or more response categories (Klauer & Kellen, 212). Wen trial-sampled criteria overlap, category assignment becomes ambiguous witout specific decision rules accounting for contingencies owing to positional relations among criteria and representations. However, any possible set of rules unambiguously resolving trial-sampled representations to category assignment may serve as a decision rule. Our experiments used eiter tree or five response categories. Te symmetry (or lack tereof) in te number of response categories may influence te coice of rule adopted by our subjects. Symmetric response structures ave an odd number of category boundaries and an even number of response categories. Tese response structures migt induce te adoption of an initial, central, and binary decision boundary wit participants only subsequently utilizing te remaining criteria as a confidence rating on teir antecedent coice. Tis is dubbed a sequential rule, along wit any rule wereby subjects compare trial stimuli wit trial-sampled criteria in a sequential manner. Asymmetric response structures ave an even number of category boundaries. Since asymmetric response structures, like te one we examined in tis study, do not naturally suggest any particular criterion as a central designation as in symmetric response structures, we restricted our examination to simultaneous rules in tis article. However, rating category asymmetry may naturally allow for te emergence of a neutral category tat subjects use as a preferred classification during trials wit lapses of attention and so may not wolly reflect categorization based on representational determinants. Altoug we cannot determine a priori

47 wic decision rule a subject migt adopt, specific data signatures may reflect idiosyncratic strategies to deal wit significantly different processing constraints in te course of encoding information and making decisions about tat information. Previous studies lend weigt to te idea tat task instructions (explicitly; Treisman & Faulkner, 1985), response structure (implicitly), and individual subject differences (Petrov, 29) may all influence decision rule adoption. We ope to explore alternative decision rules and ybrid rules in future studies. Klauer and Kellen (212) sowed tat if an observer's criterion boundaries were centered and distributed evenly about te mean of an underlying representational distribution, te LCJ would yield asymmetric response distributions. Tey argued instead for a modified decision rule tat determined response selection according to te proximity of an internal representation to te trial-sampled criteria and tat would result in a symmetric distribution of response frequencies. We ave instantiated tat alternative rule ere as LCJsym, but ave found it underperformed relative to LCJ in our data sets for wic decision noise was deemed significant. Given te limitations of our experimental study, we esitate to make strong claims regarding te general validity of alternative decision rules in operation for specific tasks or individuals. Oter tasks or experimental manipulations may very well induce subjects to adopt anoter decision rule suc as LCJsym and te framework introduced ere may allow us to identify tat rule. Experimental paradigms investigating perceptual and cognitive processes obtain information about tese underlying processes by examining responses conditioned on input stimuli, task instructions, subject population, etc. In te case of an MCR procedure, we collect additional information by conditioning subject responses on specific samples of external noise. By presenting tese samples over multiple passes, we can estimate response agreement to test

48 more nuanced ypoteses tan would be feasible oterwise. Sequential dependence, for example, may offer a potential target for investigation insofar as te penomenon of tese dependencies introduce a form of systematic decision noise. Trial-by-trial dependencies certainly bear on estimates of agreement in multi-pass psycopysics tasks. Sequential dependencies influenced by stimulus scedule (Fernberger, 192; Parducci, 1959), response coice (Howart & Bulmar, 1956), or feedback (Carterette, Friedman, & Wyman, 1966) could generate greater response agreement to te degree tat tese factors are preserved across passes. In tis case, estimates of te internal to external noise ratio are at a lower bound. If response dependencies artificially increase agreement estimates, ten removing tese dependencies will reduce covariance estimates, wic in turn leads to greater estimates of internal noise (Green, 1964). Levi et al (25) proposed randomizing te sequence of trials from pass to pass in order to mitigate agreement effects deriving from stimulus-response dependencies. Te current study followed te prescription of Levi et al by randomizing te stimulus scedule from pass to pass, but we did not examine response data for syncronized stimulus scedules across passes. Comparing internal to external noise ratios measured in multi-pass experiments wit and witout randomized trial ordering suggests itself as one way to begin teasing apart te purely stimulus related factors on trial outcomes from oter contributions to response agreement. Elaborated observer models makes more detailed claims regarding te functional mecanisms transforming stimulus inputs to overt responses (Lu & Doser, 28; Lu & Doser, 213). Many of tese models empasize te account of representational processing, but use te simplified decision processes of standard SDT. Wen ignored, response variability arising from decision processes will redound to representational processes instead, potentially leading to

49 erroneous model predictions. Wen task conditions call for increasing te number of response categories, decision boundaries may become more variable (Ratcliff & Starns, 29). In tese cases, observer models incorporating our framework may lead to a more detailed understanding of te transformation from stimulus to response. Te aim of analyzing noise contributions is a fundamental objective in cognitive psycology. Isolating component sources of noise elps us to caracterize corresponding component processes in uman beavior and decision making (Brunton, Botvinick, & Brody, 213; Ratcliff & Starns, 29). Te MCR paradigm makes available new researc directions involving noise analysis and decision strategy. Te importance of te MCR procedure and analyses in future researc will depend upon te amount of decision noise present for a given task, subject population, and experimental condition. If te decision noise is relatively negligible, a simpler SDT model will serve as a more parsimonious and efficient explanation for te observed outcomes. Te experimental results presented ere suggest tat decision noise is not a significant determinant for tasks wit few response alternatives, but may become more influential wen te number of response alternatives increase. Conclusion In tis paper, we present a new framework tat combines two well-establised procedures in psycopysics: a confidence rating response procedure and a multi-pass experimental paradigm. In combination, tese procedures allow estimation of response agreement as well as response frequency for eac response category. We provide evidence tat data collected wit tis framework sufficiently constrains extended SDT models wit decision noise. Our simulation study sowed tat te parameters of a decision noise model fitted to responses from simulated

5 experiments led to increasing accuracy and precision wit increasing trials and passes. Tese simulations also demonstrated tat decision noise models matced to te decision rule adopted by te subject will outperform mismatced models. We also conducted a visual detection rating experiment wit multiple passes. Our results sowed tat decision noise was negligible wen subjects responded wit tree confidence rating categories, but tat it influenced trial responses wit as few as five response categories. For tasks wit few response alternatives, classical SDT may adequately account for te observed data. But for tasks offering a large number of response alternatives or were decision noise is suspected, te framework presented ere offers a more detailed description of te underlying processes. Acknowledgments Te autors would like to express teir gratitude to Bosco Tjan, David Kellen, and tree anonymous reviewers for teir especially elpful suggestions and comments. Additional tanks to lab members of LOBES and T-Lab for teir very useful feedback during te researc and writing of tis study. Tis researc was supported by Grant MH8118 from te National Institutes of Healt and Grant EY17491 from te National Eye Institute.

51 References Barlow, H.B. (1957) Increment tresolds at low intensities considered as signal/noise discriminations. Te Journal of Pysiology, 136(3), 469-488. Benjamin, A.S. (213) Were is te criterion noise in recognition? (Almost) everyplace you look: Comment on Kellen Klauer, and Singmann (212). Psycological Review, 12(3), 72-726. Benjamin, A.S., Diaz, M., & Wee, S. (29). Signal detection wit criterion noise: applications to recognition memory. Psycological Review, 119(1), 84-115. Benjamin, A.S., Tullis, J.G., & Lee, J.H. (213). Criterion noise in ratings-based recognition: Evidence from te effects of response scale lengt on recognition accuracy. Journal of Experimental Psycology: Learning, Memory, and Cognition, 39(5), 161-168. Braida, L. D., & Durlac, N. I. (1972). Intensity perception: II. Resolution in one-interval paradigms. Journal of te Acoustical Society of America, 51, 483-52. Brainard, D. H. (1997) Te Psycopysics Toolbox. Spatial Vision,1, 433-436. Britten, K.H., Sadlen, M.N., Newsome, W.T., & Movson, J.A., (1992). Te analysis of visual motion: A comparison of neuronal and psycopysical performance. Te Journal of Neuroscience, 12(12): 4745-4765. Brunton, B.W., Botvinick, M.M., & Brody, C.D. (213). Rats and umans can optimally accumulate evidence for decision-making. Science, 34, 95-98. Burgess, A. E., & Colborne, B. (1988). Visual signal detection. IV. Observer inconsistency. Journal of te Optical Society of America. A, Optics and Image Science, 5(4), 617-627. Busemeyer, J. R., & Diederic, A. (21). Cognitive modeling. SAGE. Carterette, E.C., Friedman, M.P., & Wyman, M.J. (1966). Feedback and psycopysical variables in signal detection. Te Journal of te Acoustical Society of America, 39, 151-155. doi:1.1121/1.199991 DeCarlo, L.T. (22). Signal detection teory wit finite mixture distributions: teoretical developments wit applications to recognition memory. Psycological Review, 19(4), 71-721. Doser, B. & Lu, Z-L. (1998). Perceptual learning reflects external noise filtering and internal noise reduction troug cannel selection. Proceedings of National Academy, USA, 95, 13988-13993. Doser, B. & Lu, Z-L. (1999). Mecanisms of perceptual learning. Vision Researc, 39, 31973221. Durlac, N.I., & Braida, L.D. (1969). Intensity perception. I. Preliminary teory of intensity resolution. Te Journal of te Acoustical Society of America, 46(2), 372-383.

52 Egan, J.P., Sulman, A.I., & Greenberg, G.Z. (1959). Operating caracteristics determined by binary decisions and by ratings. Journal of te Acoustical Society of America, 31(6), 768-773. Fecner, G. Elemente der psycopysik. Leipzig: Breitkopf & Hartel; 186. Fernberger, S. W. (192). Interdependence of judgments witin te series for te metod of constant stimuli. Journal of Experimental Psycology, 3(2), 126-15. Friedman, M. P., Carterette, E. C., Nakatani, L., & Aumada, A. (1968). Comparisons of some learning models for response bias in signal detection. Perception & Psycopysics, 3, 511. doi:1.3758/bf321273 Gold, J., Bennett, P.J., Sekuler, A.B. (1999). Signal but not noise canges wit perceptual learning. Nature. 42(6758), 176-178. doi: 1.138/4627 Graam, N.V.S. (1989). Visual Pattern Analyzers. New York: Oxford University Press. Gravetter, F. & Lockead, G.R. (1973). Criterial range as a frame of reference for stimulus judgment. Psycological Review, 8(3), 23-216. Green, D.M. (1964). Consistency of auditory detection judgments. Psycological Review, 71, 392-47. Green, D.M. (1995). Maximum-likeliood procedures and te inattentive observer. Journal of te Acoustical Society of America, 97(6), 3749-6. Green,D.M. & Swets, J. A. (1966). Signal detection teory and psycopysics. New York: Jon Wiley & Sons. Howart, C.I. & Bulmer, M.G. (1956). Non-random sequences in visual tresold experiments. Quarterly Journal of Experimental Psycology, 8(4), 163-171. Hutcinson, T.P. (1981). A review of some unusual applications of signal detection teory, Quality and Quantity, 15, 71-98. Kac, M. (1962). A note on learning signal detection. IRE Transactions on Information Teory, IT-8, 126 128. Klauer, K.C. & Kellen, D. (212). Te law of categorical judgment (corrected) extended: A note on Rosner and Kocanski (29), Psycological Review 119(1), 216-22. doi:1.137/a25824 Klein, S. A. (21). Measuring, estimating, and understanding te psycometric function: A commentary. Perception & Psycopysics, 63, 1421-1455. Kellen, D., Klauer, K. C., & Singmann, H. (212). On te measurement of criterion noise in signal detection teory: Te case of recognition memory. Psycological Review, 119(3), 457-479. doi:1.137/a27727 Lesmes, L.A, Jeon, S-T., Lu, Z-L., & Doser, B.A. (26). Bayesian adaptive estimation of tresold versus contrast external noise functions: Te quick TvC metod. Vision Researc 46, 316-3176.

53 Levi, D. M., & Klein, S. A. (23). Noise provides some new signals about te spatial vision of amblyopes. Te Journal of Neuroscience, 23(7), 2522-2526. Levi, D. M., & Klein, S. A., Cen, I (25). Wat is te signal in noise? Vision Researc 25 Jun, 45(14):1835-46. Li, X., Lu, Z-L., Xu, P. Jin, J., & Zou, Y. (23). Generating ig gray-level resolution monocrome displays wit conventional computer grapics cards and color monitors. Journal of Neuroscience Metods, 13(1), 9-18. Lu, Z-L. & Sperling, G. (1999). Second-order reversed pi. Perception & Psycopysics, 61, 175-188. Lu, Z-L. & Doser, B. A. (1999), Caracterizing uman perceptual inefficiencies wit equivalent internal noise. Journal of te Optical Society of America, 16(3), 764-778. Lu, Z-L. & Doser, B. A. (28), Caracterizing observer states using external noise and observer models: Assessing internal representations wit external noise. Psycological Review, 115 (1), 44-82. Lu, Z-L. & Doser, B. A. (213). Visual psycopysics: From laboratory to teory. Te MIT press. Luce, R. D., & Nosofsky, R. M. (1984). Attention, stimulus range, and identification of loudness. In S. Kornblum & J. Raquin (Eds.), Preparatory states and processes (pp. 3 25). Hillsdale, NJ: Erlbaum. Luce, R. D., Nosofsky, R. M., Green, D. M., & Smit, A. F. (1982). Te bow and sequential effects in absolute identification. Perception & Psycopysics, 397 48. Macmillan, N.A. & Creelman, C.D. (24). Detection teory: A users guide. Mawa, New Jersey: Lawrence Erlbaum Associates. Macmillan, N.A., Kaplan, H.L., & Creelman, C.D. (1977). Te psycopysics of categorical perception, Psycological Review, 84, 452-471. McClelland, G.H. (211). Use of signal detection teory as a tool for enancing performance and evaluating tradecraft in intelligence analysis. In B Fiscoff & C. Cauvin (Eds.), Intelligence Analysis: Beavioral and Social Scientific Foundations (pp. 83 1). McFall, R.M. & Treat T.A. (1999). Quantifying te information value of clinical assessments wit signal detection teory. Annual Review of Psycology, 5, 215-41. Miller, G. A., (1956). Te magical number seven, plus or minus two: some limits on our capacity for processing information. Psycological Review, 63(2), Mar 1956, 81-97. doi: 1.137/43158 Mueller, S. T., & Weidemann, C. T. (28). Decision noise: An explanation for observed violations of signal detection teory. Psyconomic Bulletin & Review, 15(3), 465-494. Murray, R.F., Bennett, P.J., & Sekuler, A.B. (22). Optimal metods for calculating classification images: Weigted-sums. Journal of Vision, 2(1), 79-14.

54 Nosofsky, R.M. (1983). Information integration and te identification of stimulus noise and criteral noise in absolute judgment. Journal of Experimental Psycology: Human Perception and Performance, 9(2), 299-39. doi: 1.137/96-1523.9.2.299 Parducci, A. (1959). An adaptation-level analysis of ordinal effects in judgment. Journal of Experimental Psycology, 58(3) 239-246. Parks, T.E. (1966). Signal-detectability teory of recognition-memory performance. Psycological Review. 73(1), 44-58. Pelli, D.G. (199) Te quantum efficiency of vision. In C Blakemore (Ed.), Vision: Coding and Efficiency, (pp. 3 24). Pelli, D.G. (1997) Te VideoToolbox software for visual psycopysics: Transforming numbers into movies. Spatial Vision, 1, 437-442 Peterson, W.W., Birdsall, T.G., & Fox, W.C. (1954). Te teory of signal detectability. Information Teory, IRE Professional Group, 4(4), 171-212. doi:1.119/tit.1954.15746 Petrov, A.A. (29). Symmetry-based metodology for decision-rule identification in samedifferent experiments. Psyconomic Bulletin & Review, 16(6), 111-125. Pollack, I. (1952). Te Information of Elementary Auditory Displays. Journal of te Acoustical Society of America, 24(6), 745-749. doi: 1.1121/1.196969 Ratcliff, R. & Starns, J.J. (29). Modeling confidence and response time in recognition memory. Psycological Review, 116(1), 59-83. Rosner, B. S., & Kocanski, G. (29). Te law of categorical judgment (Corrected) and te interpretation of canges in psycopysical performance. Psycological Review, 116(1), 116-128. doi:1.137/a14463 Sorkin, R.D., & Dai, H. (1994). Signal detection analysis of te ideal group. Organizational Beavior and Human Decision Processes, 6, 1-13. Sorkin, R.D., Hays, C.J., & West, R. (21). Signal detection analysis of group decision making. Psycological Review, 18(1), 183-23. Spiegel, M. F., & Green, D. M. (1981). Two procedures for estimating internal noise. Te Journal of te Acoustical Society of America, 7(1), 69-73. doi:1.1121/1.386583 Swets, J. A., Sipley, E. F., McKey, M. J., & Green, D. M. (1959). Multiple observations of signals in noise. Te Journal of te Acoustical Society of America, 31(4), 514. doi:1.1121/1.197745 Tanner, W. P., Jr., & Swets, J. A. (1954). A decision-making teory of visual detection. Psycological Review, 61(6), 41-49. doi:1.137/587 Tanner, W.P., Jr. (1961). Pysiological implications of psycopysical data. Annals of te New York Academy of Sciences, 89(5), 752-765. doi:1.1111/j.1749-6632.1961.tb2176.x

55 Tomas, E.A.C. (1973). On a class of additive learning models: Error-correcting and probability matcing. Journal of Matematical Psycology. 1(3), 241-264. Tomas, E.A.C (1975). Criterion adjustment and probability matcing. Perception & Psycopysics. 18(2), 158-162. Torgerson, W. S. (1958). Teories and metods of scaling. New York: Wiley. Treisman, M. (1984). A teory of criterion setting: an alternative to te attention band and response ratio ypoteses in magnitude estimation and cross-modality matcing. Journal of Experimental Psycology: General, 113(3), 443-463. Treisman, M., & Faulkner, A. (1984). Te setting and maintenance of criteria representing levels of confidence. Journal of Experimental Psycology: Human Perception and Performance, 1(1), 119-139. doi:1.137/96-1523.1.1.119 Treisman, M., & Faulkner, A. (1985). Can decision criteria intercange locations - some positive evidence. Journal of Experimental Psycology: Human Perception and Performance, 11(2), 187-28. Treisman, M., & Willaims, T. C. (1984). A teory of criterion setting wit an application to sequential dependencies. Psycological Review, 91(1), Jan 1984, 68-111. Treutwein, B. (1995). Adaptive psycopysical procedures. Vision Researc, 35(17), 253-2522. Tsetsos, K., Cater, N., & User, M. (212). Salience driven value integration explains decision biases ad preference reversal. Proceedings of te National Academy of Sciences.19(24), 9659-64. Wicmann, F.A. & Hill, J. (21) Te psycometric function: I. Fitting, sampling, and goodnessof-fit. Perception & Psycopysics, 63(8), 1293-1313. Wickelgren, W. A. (1968). Unidimensional strengt teory and component analysis of noise in absolute and comparative judgments. Journal of Matematical Psycology, 5(1), 12122. doi:16/22-2496(68)959-x Wickelgren, W.A., & Norman, D.A. (1966) Strengt models and serial positions in sort-term recognition memory. Journal of Matematical Psycology, 2, 316-347. Wixted, J.T., & Gaitan, S.C. (22) Cognitive teories as reinforcement istory surrogates: te case of likeliood ratio models of uman recognition memory. Animal Learning & Beavior. 2(4), 289-35. Wonnacot, T.H., & Wonnacot, R.J. (1981). Regression: A second course in statistics. New York: Jon Wiley & Sons, Inc. Yesurun Y., Carrasco M., & Maloney, L. T. (28) Bias and sensitivity in two-interval forced coice procedures: Tests of te difference model. Vision Researc, 48, 1837-1851.

56 Table 1 Degrees of freedom in rating procedure tasks Procedure Data Points Free parameters Rating 2M < 2M + 3 MCR 2x2M > 2M + 3 Table 2 Parameter Configurations for Simulation Study Encoding Noise Decision Noise Equal Ascending Equal Ascending Descending Table 3 Payoff Matrix for Rating Detection Task Subject response a Trial type 1 2a 3a 4a 5 Signal absent 2 1-1 -3 Signal present -3-1 1 2 Te response alternatives indicated in gray represent te payoff structure for te tree response categories rating tasks, wile te entire response range represent te payoff structure for te five response categories rating task.

57 Table 4 Parameter estimates for tree and five response categories (joint model fits) Model Parameters (in units of σext) and χ2 Sub Mod 2 Criteria Representation 4 Criteria χ2 μc1 μc2 σc1 σc2 σe σe1 μs1 csdt YZ - - μc2 μc3 μc4 σc1 σc2 σc3 σc4 σe σe1 μs1 1.47 1.52 1.72 7.18-1.85 -.6 1.25 3.29 - - - - χ2 1.48 1.51 1.64 22.38 LCJ -.37 1.3 1.28 1.25 1.2 1.2 1.66 1.51-1.9 LCJc -.93 1.22 1.3 1.49 1.49-2.1 -.36 1.11 3.22 1.36 1.9.45.4 1.36 1.36 1.68 1.88 -.87 1.21.9 1.48 1.49 1.69 3.3-1.83.29 1.2 3.25.23.94.34.2 1.41 1.41 1.7 11.78 LCJsym CC -.8 1.22 μc1 Representation 1.7 2.67.31 1.21 3.24.92.74.97.38 1.32 1.32 1.7 8.59 csdt 1.1 1.68 - - 1.65 1.7 LCJ 1.1 1.7 1.66 1.75 2.57 2.27 -.36.98 1.52 3.73 1.65.7.3.39 1.74 1.95 2.62 6.66 LCJc 1.5 1.5.25 1.5 1.62 2.46 1.73-1.22.91 1.41 3.56 1.29 LCJsym 1.31 1.4.5.3 1.37 1.41 2.45 1.74 -.99.88 1.36 3.44 2.52 2.57-1.19 1 1.57 3.79-1 - - - 1.74 2.3 2.77 19.8.2.72 1.5 1.86 2.51 9.41.2 1.41 1.8 2.41 1.77.8 Table 5 Model fits to subject data of tree and five rating categories wit identical representation parameters Model Parameters (in units of σext) 2 Criteria Subject μc1 μc2 4 Criteria σc1 σc2 μc3 μc4 σc1 Representation μc1 μc2 σc2 σc3 σc4 σe σe1 μs1 YZ -.75 1.22 - - -1.9.26 1.24 3.31.43.71.76.8 1.43 1.43 1.73 CC 1.13 1.75 - - -.36.99 1.52 3.73 1.64.7.3.39 1.73 1.95 2.62

58 Table 6 Parameter estimates for LCJ fit to ROC-only vs full data sets for five response categories Model Parameters (in units of total representational noise on signal absent trials) 4 Criteria Subject YZ CC Representation Data Modeled μc1 μc2 μc3 μc4 σr1 μs1 ROC -1.14.19.72 1.97 1 1.2 ROC + Cov -1.15.19.73 1.96 1 1.3 ROC -.19.49.75 1.85 1.8 1.33 ROC + Cov -.18.49.76 1.86 1.9 1.3 Table 7. Median decision noise model parameter estimates for simulated data from mixture distributions Model Parameters (in units of σext) Model Type μc1 μc2 σc1 σc2 σe σe1 μs1 λ Mixture Model.7.71 - - 1 1 1.41.5 LCJ.7.66 1 1.5 1.29 - LCJc.5.65 1 1.1 1.28 - LCJsym.2.58.98.99 1.19 -

59 Figure 1: Top: decision axis under a classical confidence rating framework. Representations of signal-absent and signal-present distributions take te form of Gaussian probability density functions. Te subject uses static criteria to partition te decision axis in order to map internal representations to overt responses. Bottom: a modified confidence rating framework in wic te criteria are formulated as probability density functions wit means μc1, μc2, and μc3 due to trial by trial variability in decision processes. In tis tis and later figures, probability density functions for criteria noise are sown reflected below te decision axis for clarity.

6 Figure 2: Left: An ROC wit tree different decision criteria. Wen te signal strengt is low, performance decreases, values of HR and FAR converge, and te ROC curve approaces te unity slope. Wit iger signal strengt, HR and FAR diverge, so te ROC curve moves up and to te left. Rigt: underlying distributions of stimulus representations at te decision stage sown wit ig encoding noise and low decision noise (top panel) and an alternative representation wit lower encoding noise and iger decision noise (bottom panel), eac leading to te same performance outcome.

61 Figure 3: Left: a multi-pass procedure contains at least two runs wit identical samples of external noise added to corresponding trial stimuli witin eac pass. Corresponding trials need not be presented according to te same stimulus scedule for eac pass, but we matc external noise samples wit trial order ere for te purpose of illustration. Rigt: Measures of agreement (percent agreement, covariance, correlation) between responses to corresponding trials across passes provide additional beavioral measure to elp constrain observer models.

62 Figure 4: Left: Two overlapping ROCs generated using a decision rule described by Rosner and Kocanski (29; see decision rules below) and assuming two different underlying parameter sets. Parameters 1 (open symbols): encoding noise is 1 for bot signal absent and signal present trials; mean of te signal distribution is 1; criteria are located at -.62,,.5, 1 wit criterion noise at.1 for all criteria. Parameters 2 (X's): encoding noise is.8 for signal absent trials,.4 for signal present trials; signal mean is.92; criteria are located at -.15,,.5,.77 wit corresponding criteria noise of.125, 1,.3,.2. All quantities given in units of te consistent noise, σext. Rigt: covariance outcomes using te same two underlying parameter sets result in discriminably different data patterns. Witin-category covariances are denoted as [r,r] and lie witin te gray bar. Between-category covariances lie outside te gray bar. Blue symbols mark witin- and between-category covariances for response 2 ; red for response 3 ; black for response 4 ; and magenta sows witin-category covariance for response 5. For example, between-category covariance for response categories 3 and 5 across passes are sown as a red diamond and an X at te position r, r+2 along te abscissa.

63 Figure 5: Left-top: decision space for classical confidence rating signal detection task wit no decision noise. Criterion locations lie at te means of te signal-absent and signal-present distributions. Left-center and bottom: decision space sowing joint distributions wen decision noise equal to te representational noise is selectively added to te more lax criterion. Te center of te concentric circles represents te mean position of te lax criterion along te ordinate, and te mean position of te signal-absent distribution (center) and signal-present distribution (bottom) along te abscissa. Straigt blue lines represent mean criterion positions. Numbers overlaying joint distributions denote expected response categories for trial-sampled criteria and representations falling in tese regions. Rigt: zroc (top) and covariance data (bottom) for classical signal detection task witout decision noise (circles and diamonds) and wit decision noise equal to representational noise at te more lax criterion (X's). Witin-category covariance data lie witin te gray bar, between-category covariance data lie outside te gray bar. Covariance data indicating a response of 2 in at least one pass are blue; witing-category covariance for response 3 in bot passes labeled as red. See main text for more details.

Figure 6: Criterion overlap and stimulus-response mapping for tree different decision rules. Random trial-by-trial sampling may lead to ordinal rearrangement of criteria (C1 and C2). Te encircled letters A, B, C, and D denote different positions of trial sampled stimulus representations falling along te decision axis. An observer requires an explicit decision rule to map te internal representation to a response. Under eac stimulus representation, te columns of te Observer Response sows ow an observer operating under te LCJ, LCJc, and LCJsym decision rules classifies eac stimulus representation above. See main text for response mapping protocols. 64

65 Figure 7: Probability density functions for six representative parameter configurations underlying response beavior for simulated observers. Density functions represent signal-absent trials (mean zero), signal-present trials (wit greater mean values), and criterion noise (reflected downward across te decision axis). DN: decision noise; EN: encoding noise.

66 Figure 8: Standard error (SE) of parameter fits to data from simulated experiments for different pass-trial and parameter configurations. Decision noise: DN; Encoding noise: EN. Average SE across all parameters given for four passes/experiment (cyan circles) or six passes/experiment (black squares). Maximum SE among parameters given by asterisks and triangles. All parameter configurations sow less variability in parameter fits wit increasing trials and passes.

67 Figure 9: Average mean squared error (MSE) of parameter fits to simulated data for various pass-trial and parameter configurations. Average MSE across all parameters given for four passes/experiment (cyan circles) or six passes/experiment (black squares). Maximum MSE among parameters given by asterisks and triangles. (Maximum for DN-asc EN-equ at 25 trials, 4 passes is.465; not sown in order to preserve scale).

Figure 1: Top and middle rows: average log mean-squared error (MSE) for model fits vs trials/pass (assuming six passes/experiment) for te LCJ, LCJc, and LCJsym matced to data simulated using te LCJ decision rule. Bottom: average (MSE) for model fits to simulations wen decision noise and encoding noise are equal across criteria and stimulus classes. Bottom left: LCJ, LCJc, and LCJsym modeled to data simulated using te LCJc decision rule. Bottom rigt: LCJ, LCJc, and LCJsym modeled to data simulated using te LCJsym decision rule. 68

69 Figure 11: Top and middle rows: average log χ2 for model fits vs trials/pass (assuming six passes/experiment) for te LCJ, LCJc, and LCJsym matced to data simulated using te LCJ decision rule. Bottom: log χ2 for model fits to simulations wen decision noise and encoding noise are equal across criteria and stimulus classes. Bottom left: LCJ, LCJc, and LCJsym modeled to data simulated using te LCJc decision rule. Bottom rigt: LCJ, LCJc, and LCJsym modeled to data simulated using te LCJsym decision rule.

7 Figure 12: z-score plots for subjects CC and YZ wit tree and five rating categories. Points on te zroc from experiments wit tree rating categories lie above te best fitting line to points estimated from experiments wit five rating categories. Tis result may reflect increasing decision noise wit te use of additional response criteria.

Figure 13: Classical SDT (tree response categories) and LCJ (five response categories) model fits for zroc and covariance data for subjects CC and YZ. Covariance graps: point [r,r] (gray bars) corresponds to witin-category covariance, wile all oter points correspond to betweencategory covariance. 71