Jackknife-based method for measuring LRP onset latency differences

Size: px
Start display at page:

Download "Jackknife-based method for measuring LRP onset latency differences"

Transcription

1 Psychophysiology, 35 ~1998!, Cambridge University Press. Printed in the USA. Copyright 1998 Society for Psychophysiological Research METHODOLOGY Jackknife-based method for measuring LRP onset latency differences JEFF MILLER, a TUI PATTERSON, a and ROLF ULRICH b a Department of Psychology, University of Otago, Dunedin, New Zealand b Department of Psychology, University of Wuppertal, Germany Abstract A new method based on jackknifing is presented for measuring the difference between two conditions in the onset latencies of the lateralized readiness potential ~LRP!. The method can be used with both stimulus- and response-locked LRPs, and simulations indicate that it provides accurate estimates of onset latency differences in many common experimental conditions. Descriptors: Lateralized readiness potential, Onset latency measurement, Jackknifing, Computer simulations In recent years, the lateralized readiness potential ~LRP! has become an important tool in psychophysiological studies of choice reaction time ~RT! tasks ~e.g., Coles, 1989!. In such tasks, a stimulus is presented and people must respond as quickly as possible. A classic problem in the study of such tasks is to determine which mental processes are influenced by an experimental manipulation, such as stimulus probability. For example, it is easy to establish that people respond faster to probable stimuli than to improbable stimuli, but discerning whether this difference in RT is because of faster processing at the level of perception, decision making, or motor response is difficult ~e.g., Gehring, Gratton, Coles, & Donchin, 1992; Miller & Pachella, 1973!. In principle, this problem can be addressed by using the LRP, an electrophysiological indicator of response preparation that is obtained by comparing electroencephalographic ~EEG! activity over the left and right motor cortices prior to movements of the left and right hands ~for a complete description of LRP derivation, see Coles, 1989!. The onset of LRP provides a time marker intervening between stimulus and response, indicating the beginning of sidespecific response preparation ~Coles, 1989!, and this time marker could be used to localize the effects of experimental manipulations. For example, an experimenter could compare the time between stimulus onset and LRP onset for high and low probability stimuli. If this time is the same for the two types of stimuli, then it can be inferred that processing of high probability stimuli is This work was supported by cooperative research funds from the Deutsche Raum- und Luftfahrtgesellschaft e.v. and the New Zealand Ministry of Research, Science, and Technology. During her work on this project, Tui Patterson was supported by an Otago University summer bursary. We thank Michael Coles, Patricia Haden, Aaron Ilan, Bert Mulder, Allen Osman, and Fren Smulders for helpful comments on earlier drafts of the manuscript. Address reprint requests to: Jeff Miller, Department of Psychology, University of Otago, Dunedin, New Zealand, miller@otago.ac.nz. faster after the moment when response preparation begins. Alternatively, if this time is shorter for high probability stimuli, then it can be inferred that processing is faster before response preparation begins. This general inferential technique can be used with almost any experimental manipulation. Unfortunately, determining exactly when LRP onset occurs is difficult because EEG signals have a low signal-to-noise ratio. Previous investigators have devised and used various methods to determine differences in LRP onset latencies but have reported evaluations of the efficiency of their methods in only one case ~Smulders, Kenemans, & Kok, 1996!. This article presents a class of new methods that are computationally simpler than previous methods and reports computer simulations indicating that the new methods are also more accurate. Measurement of Latencies Versus Measurement of Latency Differences Previous investigators have focused on accurate measurement of LRP latency within a condition. Viewed from this perspective, the problem is to determine what level the measured LRP might reach by chance when true LRP onset had not yet occurred. A criterion for LRP onset can then be set at a level just beyond what is likely to be reached by chance, and LRP onset can be estimated as the moment at which LRP reaches this criterion level. Both parametric ~Osman, Bashore, Coles, Donchin, & Meyer, 1992! and nonparametric ~Van Dellen, Brookhuis, Mulder, Okita, & Mulder, 1985! statistical techniques have been used to determine when the LRP exceeds chance levels. Regardless of the statistical technique, however, the problem is extremely difficult for at least three reasons: ~a! there is considerable noise in the LRP, ~b! at its onset, LRP usually rises gradually rather than sharply, and ~c! statistical testing at many time points, which is necessitated by the high frequencies at which EEG is sampled, inflates the Type I error probability by providing many opportunities to conclude incorrectly that LRP onset has occurred. 99

2 N 100 J. Miller, T. Patterson, and R. Ulrich In contrast, the new approach suggested in the present study focuses on accurate measurement of the difference in latencies between two conditions. Specifically, we suggest measuring in each condition the latency at which the LRP reaches a fairly large criterion value, which is much larger than the minimum LRP needed to say that LRP onset has occurred ~cf. Smulders et al., 1996!. Clearly, each of these latencies will tend to be too large in isolation because the large criterion will only be reached some time after onset. Nonetheless, as long as both latencies are too large by the same amount, the difference in latencies will accurately reflect the true difference in onset latencies. 1 In short, we acknowledge that the new procedure may produce a poor measure of LRP onset latencies per se but nonetheless argue that this technique produces a good measure of differences in onset latencies. Thus, the new procedure should be especially useful in situations in which differences are of primary theoretical interest, although other methods may also be useful for determining absolute rather than relative LRP onset latencies. The new approach begins with a simple measure of the onset latency within a given condition, also used by Smulders et al. ~1996!. A criterion level of LRP, say 0.5 mv, is chosen large enough to be certain that this level would not be crossed by chance in the grand-average waveforms. LRP onset latency in each condition is then measured as the first time point at which the grandaverage LRP for that condition exceeds this criterion. Figure 1, for example, shows the stimulus- and response-locked LRPs obtained in two conditions, with cutoffs of 0.5 mv used to obtain stimuluslocked LRP onsets of 208 ms and 280 ms and response-locked LRP onsets of 216 ms and 244 ms in the control and experimental conditions, respectively. Thus, for each type of waveform, the estimated latency difference, D, is the difference between these values. Although the latency estimates could be questioned in absolute terms, the corresponding differences of D ms for the stimulus-locked waveforms and D ~ 244! ~ 216! 28 ms for the response-locked waveforms seem to be reasonable summaries of the data. Measurement of onset latencies in this fashion suffers minimally from each of the three problems just described as plaguing previous methods. First, LRP noise is kept to a minimum because grand averages are used. Second and more importantly, the problematic gradual initial increase is avoided by setting a high criterion for onset, which will only be reached in the sharply rising portion of the LRP, where noise will have less effect on the estimated latency. Third, no explicit statistical testing is done to identify the moment of onset, so there is no need to worry about obtaining significant LRP by chance given a preselected a level. Estimating the Standard Error of the Latency Difference As described so far, the proposed new method produces a single estimate of the latency difference between two conditions based on the overall mean LRP across all participants ~cf. Smulders et al., 1996!. It is also necessary to have an estimate of the standard error of this difference, however, to test hypotheses about the true difference ~e.g., is it zero?! and to construct confidence intervals. 1 The two latency estimates might be too large by different amounts in cases in which the initial portions of the two LRPs have different shapes, either because of differences in response activation or, for stimulus-locked LRPs, because of differences in onset variability of the response process. We are primarily concerned with common cases in which the initial portion LRP shape does not differ much across conditions, but the effects of LRP shape differences on the new method will be considered in a later section. Figure 1. Examples of measured lateralized readiness potential ~LRP! latency differences, D, in observed stimulus- and response-locked LRPs. In both types of waveforms, latencies were scored with a criterion of 0.5 mv. Classically, the standard error of a difference is estimated with the following formula. If d i, i 1,...,Nare the latency differences observed for each of N participants and d is the mean of those differences, then the standard error of the mean difference, s d N,is s dn ( N ~d i d! N 2 i 1. ~1! N~N 1! In essence, this formula uses a measure of the individual variation of the difference scores ~i.e., the numerator of the fraction! to estimate the random error associated with the sample summary statistic ~i.e., mean difference scores!. Unfortunately, the classical approach to the computation of standard error may not work well in the case of LRP onset latency differences because the individual-subject LRPs contain much more noise than the grand-average LRPs. Even if the experimenter chooses a fairly large criterion value, there is some chance that an individual participant s LRP will cross the criterion by chance in one condition or another. Moreover, when the criterion is set to a reasonably large value, there is also a substantial chance that an individual participant s LRP will never reach that criterion at all, leading to an awkward problem of missing data. As part of the new method, we propose to use an alternative technique known as jackknifing ~Efron, 1981; Jackson, 1986; Miller, 1974; Mosteller & Tukey, 1977! to measure the standard error, s D, of the difference. Using this technique, a researcher would compute the values D i, i 1,...,N, where D i is the difference in latencies computed from a subsample including all subjects except for subject i. More specifically, to obtain each D i, one first computes the grand-average LRP for each of the two conditions, averaging across all subjects except subject i. Then, one checks each of these two grand averages to see at which point the criterion level of LRP is reached, thereby obtaining a latency estimate for each grand average. The value of D i is then the difference between these two latency estimates. If JN is the mean of the differences obtained in the subsamples ~i.e., JN ( D i 0N!, then the jackknife estimate of the estimated standard error of the difference, s D,is s D N 1 N N { ( i 1 ~D i J! N 2. ~2! ~The Appendix presents a numerical example to illustrate the proposed computations in more detail.! Unlike the traditional measure of standard error, this technique compares variation in the quantity

3 LRP onset latency differences 101 of interest across subsets of the total sample rather than across individuals. In brief, its conceptual basis is to judge the variability between subjects by temporarily leaving each subject out of the calculation. If all participants show approximately the same effect, then the values of D i should be quite similar whichever subject is omitted. If the participants show different effects, however, then the results should fluctuate substantially depending on which subject is left out. Although some may find its conceptual basis less intuitive than that of the traditional techniques, jackknifing nonetheless deserves serious consideration because it has a sound theoretical basis and useful distributional properties and because it is sometimes superior to traditional methods ~Efron, 1981!. In various standard cases ~e.g., standard error of a sample mean!, the jackknife estimate of standard error is mathematically equivalent to the classical estimate of standard error. In summary, we propose to ~a! measure the difference in LRP onset latencies between two conditions by taking the difference, D, in the times at which the grand-average LRPs cross a certain fairly large criterion value, and ~b! estimate the standard error of this difference with the jackknife standard error, s D, obtained from Equation 2. In the remainder of this article, we report simulations designed to evaluate this procedure and some alternative procedures. In these simulations, we also examined the effects of using various different cutoff values in conjunction with the new procedure. These cutoff values were defined either in absolute terms ~e.g., 0.5 mv, 1.0 mv! or in relative terms ~e.g., 30% of the maximum value of the LRP!, and it turned out that criteria defined in relative terms lead to more accurate difference estimates than criteria defined in absolute terms. have had to make somewhat arbitrary assumptions about the size and temporal pattern of EEG noise and about the characteristics of the signal of interest ~e.g., LRP!, including its average shape, subject-to-subject variability in its shape, and trial-to-trial variability in its shape within each participant. To generate the responselocked LRPs needed in the present case, we would also have to make assumptions about the distributions of RTs both within and between subjects and about the precise temporal relationship of the EEG to the overt manual response. Fortunately, in the present case, it is possible to derive appropriate simulated data sets from actual observed data sets, thereby eliminating the need for such arbitrary assumptions. Thus, the simulations reported in this article were based on actual data, and we used two different data sets as a means of cross-validation. Figure 2 illustrates the method used to generate simulated data corresponding to an experiment with two conditions ~referred to as the experimental and control conditions!, starting from actual observed data. With this method, the true mean RT is adjusted to be 100 ms larger, on average, in the experimental condition than in the control condition, and this effect on RT is entirely due to a 100-ms increase in stimulus-locked LRP onset latency, with no Simulations To evaluate the proposed new measurement procedure and compare its accuracy with that of other procedures, we need to determine the sampling distributions of the values produced by each procedure under a known set of conditions. With these sampling distributions, for example, it would be a simple matter to see which procedure yielded the best estimate on the average, which one produced the least random variation in estimation, and so on. Unfortunately, the measurement procedures are complicated and the distributional properties of the underlying sources of noise are unknown, so it is not possible to calculate these sampling distributions analytically. It is possible to estimate these sampling distributions by simulation, however, so that is the approach we have taken. Overview and General Method In each of the present simulations, a two-step procedure was iterated 1,000 times, with each iteration simulating one whole LRP experiment and its analysis. The results of the 1,000 analyses were tabulated in various ways to see how well the analysis recovered the true LRP onset latency difference, as described further below. Each iteration had two main steps. ~a! Generate a random set of data corresponding to the outcome of a single experiment. As described below, this data set was drawn from a population with known true differences in stimulus- and response-locked LRP onset latencies. ~b! Analyze the generated data to obtain estimates of the between-condition differences in stimulus-locked LRP latencies and in response-locked LRP latencies. To get valid simulation results, it is crucial that the generated data sets be as realistic as possible, but this realism is not easy to achieve. To generate simulated EEG data, previous researchers Figure 2. Illustration of procedure for generating simulated data for two conditions with a 100-ms difference in stimulus-locked onset latencies. A represents the pool of all actual experimental trials for a given real participant. B and C show reaction times ~RTs! and lateralized readiness potentials ~LRPs! for two trials sampled randomly from this pool and assigned to the experimental and control conditions. D and E show the data actually used in the simulations, as derived from B and C, respectively. For the control condition, the randomly selected RT and LRP are used without modification. For the experimental condition, the selected LRP is shifted 100 ms later in time, and RT is increased by 100 ms.

4 102 J. Miller, T. Patterson, and R. Ulrich change in response-locked LRP. Figure 2A represents the LRPs obtained in a single pool of actual experimental trials from a single participant in a single condition from the observed data ~e.g., trials with left-hand responses!. 2 Figures 2B and 2C show observed LRPs on two trials, each with its own RT, drawn randomly from this pool and then assigned randomly to the experimental and control conditions. Figures 2D and 2E represent the trials actually used in the simulation, as derived from those depicted in Figures 2B and 2C, respectively. In the control condition, the observed data are used in the simulation without any modification; that is, the simulated data in Figure 2D are identical to the observed data in Figure 2B. In the experimental condition, the data are modified in two ways before being used in the simulation. First, the entire EEG waveform is shifted 100 ms to the right along the time line, so that each simulated EEG reading occurs 100 ms later in the simulated data than in the actual data. 3 Second, 100 ms are added to the RT, so that the relation of the LRP to the response is the same in the simulated data as in the observed data. Adding 100 ms to each RT randomly assigned to the experimental condition increases the mean RT for that condition by 100 ms relative to the mean RT in the control condition, although RTs still vary considerably within both conditions. This method of constructing simulated data has several useful properties. First, the method guarantees that the true difference in stimulus-locked LRP onsets is 100 ms, despite the fact that we do not know precisely when LRP onset occurred either in the original pool of trials or in the constructed experimental and control conditions. Whenever LRP started without the shift, it must start 100 ms later with the shift, because the EEG is shifted 100 ms later in the experimental condition. Second, the construction guarantees that there is no true difference in the onsets of response-locked LRPs because RT and EEG are shifted by the same amount. Third, because they are derived from real data, the simulated data are realistic with respect to EEG noise, LRP signals, trial-to-trial and subject-to-subject variability in EEG and RT, and so forth. 4 It should be emphasized that the LRPs derived from the simulated data will also vary realistically from one simulated experiment to 2 Technically, it is inaccurate to use LRP to refer to an asymmetry observed on trials with a single response hand because the LRP is derived by averaging C30C4 asymmetries over trials with both left- and right-hand responses. Nonetheless, it is simplest to describe our method in terms of what happened with trials for a single hand; the same method was used for the other hand, and then the results were averaged across hands to obtain a true LRP. 3 The figure is somewhat oversimplified because it suggests that we simply shifted the LRP difference score, C3 ' minus C4 ' or C4 ' minus C3 ', depending on the condition. In fact, we actually shifted both C3 ' and C4 ' because one of the methods to be examined ~i.e., the Wilcoxon! requires that these two channels be kept separate. A further complication is that shifting all the EEGs 100 ms later necessarily creates a gap for the first 100 ms of the baseline period. In most cases, this gap was filled in with the original values of the first 100 ms as they had existed before the shift, except in reverse order to avoid creating a discontinuity at 100 ms. This procedure was not appropriate for the baseline deviation method ~described later! because it artifactually reduced the estimate of variability during the baseline period. For simulations involving this method, we shifted only the EEG readings after the end of the baseline period to the right along the time axis ~i.e., to later time points!. With this procedure, the gap was created during the first 100 ms after the end of the baseline period, and we again filled it by reinstating its original values in reverse order. 4 In one respect, the simulated data could be unrealistic in the experimental condition with the stimulus-locked effect. Because EEG was shifted in this condition, any event-related desynchronization ~e.g., Pfurtscheller & Aranibar, 1977! present in the EEG would start later in the experimental condition than in the control condition. the next because data from different combinations of participants were used in different simulated experiments, and different trials were randomly assigned to the experimental and control conditions in each simulated experiment. In one simulated experiment, for example, the trials with faster responses might by chance tend to be assigned to the control condition, in which case the observed effect of condition on RT would be greater than 100 ms for this simulated experiment. Similarly, in another simulated experiment, the control condition might contain trials with especially large or small observed LRPs, with especially early or late observed stimuluslocked LRPs, and so forth. In short, although the only true differences between experimental and control conditions are the 100-ms differences in RT and in stimulus-locked LRP onsets, the simulated data will contain various observed differences that might arise by chance when sampling actual trials from two such conditions. 5 The method illustrated in Figure 2 can easily be modified to construct simulated data sets that would be obtained if the experimental manipulation increased response-locked LRP latency or increased both stimulus- and response-locked latencies. To generate data for an experiment with a 100-ms effect on response-locked LRP, for example, 100 ms are added to RT for trials in the experimental condition, but EEG is not shifted for either condition. With the response moved 100 ms later and the EEG left as it was, the LRP must by definition start 100 ms earlier, relative to the response, in the experimental condition than in the control. The experimental and control conditions have identical underlying stimulus-locked LRPs because the relation of EEG to stimulus onset is not altered. As another example, it is also possible to generate data for an experiment with 100-ms effects on both stimulus- and response-locked LRP onsets. To generate these data, 200 ms are added to RTs in the experimental condition, but EEGs in this condition are shifted by only 100 ms. Estimation of 100-ms Effects Table 1 summarizes the results of two simulations conducted to see how accurately the new method estimates 100-ms effects on stimulus- or response-locked LRPs. For the simulation shown in the left half of the table, trials selected for the experimental condition were modified to implement a 100-ms effect on stimuluslocked LRP onset latency ~i.e., as shown in Figure 2!. For the simulation shown in the right half of the table, trials selected for the experimental condition were modified to implement a 100-ms effect on response-locked LRP onset latency ~i.e., 100 ms were added to RT but EEG was not shifted!. For each simulated experiment in both types of simulations, N 8 participants were sampled without replacement from a pool of 20 actual experimental participants whose data had been collected in connection with 5 The only real simplification embodied in the simulation procedure is that the experimental effect is 100 ms for all participants and all trials. This simplification is unlikely to be exactly true because there is surely some variation in effect size across trials and participants. However, the evidence suggests that the effect size variance is small enough to be safely ignored. Consider RT: To the extent that effect size varies from trial to trial, the standard deviations of individual trial RTs, computed across trials within a participant, should tend to be larger in the experimental condition than in the control condition. In our experience, however, there is usually not much difference in these standard deviations, suggesting that trial-to-trial variance in effect size is a rather small proportion of the within-subject variance. Similarly, if the effect size varies across participants, then the standard deviation of the mean RTs should be larger in the experimental condition than in the control condition; again, no such large differences are evident. Similar considerations suggest that the effects of condition on the size and onset latencies of LRP have little variance relative to other sources.

5 LRP onset latency differences 103 Table 1. Mean (M) and Standard Deviation (SD) of Estimated Differences (D) in Stimulus-Locked (SL) and Response-Locked (RL) Onset Latency 100-ms Stimulus-locked effect 100-ms Response-locked effect SL Onset RL Onset SL Onset RL Onset Method and criterion M SD M SD M SD M SD Absolute criteria ~mv! Relative criteria ~% maximum amplitude! Wilcoxon ~critical p level for determining onset! Baseline deviation ~number of noise SDs! Half-amplitude ~% maximum amplitude! another project ~Miller, Ulrich, & Rinkenauer, 1997!. 6 A new random sample of participants was chosen for each simulated experiment so that the simulated data sets would vary in betweensubjects variation, just as actual data sets do. For each participant within a simulated experiment, 50 trials were randomly assigned to each of the two simulated conditions. The rows of Table 1 correspond to different scoring methods and criteria, including both the proposed new method, discussed in this section, and several alternative methods, discussed in the next section. Each scoring method was used on the identical simulated data sets, so that the accuracy of different methods could be compared directly. We tried several different scoring criteria with each method because a given method may be more or less effective, depending on the exact criterion chosen. For example, the criterion of 0.5 mv considered above ~e.g., Figure 1! is a somewhat arbitrary choice. To help identify the most effective criterion to use with the new method, we scored the data with five different criteria defined in absolute terms ~0.2, 0.4, 0.6, 0.8, or 1.0 mv! and also with five criteria defined in relative terms ~10, 30, 50, 70, or 90% 6 For each participant in this experiment, we had observed approximately 150 artifact-free trials per hand. On each trial, we had recorded a prestimulus baseline period of 200 ms and a poststimulus epoch of 2,000 ms, sampling at 250 Hz, with bandpass settings of Hz and impedances below 5 kv. The raw EEG was filtered off-line by using a low-pass filter with a half-power cutoff of 4 Hz. All of the participants had discernable response-locked LRPs. of the maximum LRP amplitude!. Similarly, we also tried various criteria with the other methods because there is no way to judge a priori which criterion is likely be most effective ~e.g., Osman & Moore, 1993; Smulders et al., 1996!. In Table 1, the left-most column labeled M shows the mean estimated difference in stimulus-locked onset latencies ~experimental condition minus control condition!, averaging across 1,000 simulated experiments. Evidently, the new method is accurate in the long run because it produces mean differences that are very close to the true value of 100 ms for almost every criterion tried. The adjacent column, labeled SD, shows the standard deviations of the 1,000 estimated differences computed with each criterion ~inspection of frequency distributions indicated that the distributions of estimated differences were approximately normal for all criteria!. The relative criteria of 30 70% produced the smallest standard deviations, which indicates that they provide the best estimates of the difference ~i.e., they are subject to less random variability around the true value from one experiment to the next!. 7 7 The means and standard deviations shown in this table should not be interpreted as precise quantitative estimates in cases in which SD ms. Such large standard deviations indicate that onset latency estimation by that method was contaminated by outliers arising in many simulated experiments. For example, the criterion might be satisfied at the end of the baseline period, producing a latency of zero in that data set, or the criterion might never be satisfied anywhere in the epoch, in which case the end of the epoch ~2,200 ms! was taken as the latency.

6 104 J. Miller, T. Patterson, and R. Ulrich The third and fourth columns of the table show the analogous results for the estimated differences in response-locked onset latencies. The means of the estimated differences are approximately zero, in accordance with the fact that the data for this simulation were constructed to have equal response-locked onsets in both conditions. In this case, the 50 90% relative criteria produced the smallest standard deviations, indicating that they are the most accurate estimators. The rightmost four columns of the table show the results for the complementary simulation in which the data were constructed with a 100-ms difference in response-locked onsets and no difference in stimulus-locked onsets. On average, the estimated differences in stimulus-locked onsets latencies are quite close to zero, as they should be, and the estimated differences in response-locked onsets are close to 100, which is consistent with the fact that LRP onset was 100 ms earlier in the experimental condition than in the control condition. The relative criteria again produced the smallest standard deviations, with the optimal criteria again being 30 70% for the measurement of stimulus-locked onsets and 50 90% for the measurement of response-locked onsets. In summary, the results presented in Table 1 indicate that the new method is very promising, especially when used with a relative criterion of approximately 50% for stimulus-locked onsets and 90% for response-locked onsets. In the long run, the estimated differences are essentially identical to their true values, and the estimates do not vary much around the true value from one experiment to the next. Many questions about the new method remain, however, before we can recommend that LRP researchers routinely apply it. It is also interesting to note that estimation of onset differences is more accurate in response-locked waveforms than in stimuluslocked ones. This finding would seem to be a natural consequence of the fact that the LRP is better time locked to the response than to the stimulus ~Coles, 1989!, which implies a higher signal-tonoise ratio for response-locked LRPs. Comparison with Other Methods Perhaps the most obvious question is how the new method compares with other possible methods. We report on the accuracy of three alternative methods, two of which have been used in several previous studies. The Wilcoxon method was first described by Van Dellen et al. ~1985! and subsequently used by De Jong, Wierda, Mulder, and Mulder ~1988! and Smid, Lamain, Hogeboom, Mulder, and Mulder ~1991!, among others. In brief, for each participant and condition, a time series of Wilcoxon statistics is computed, with each Wilcoxon providing a nonparametric test of the null hypothesis that the C3 ' 0C4 ' difference at that time point is the same for trials on which the left and right hands are activated. Then, t tests across subjects are computed by using the values of the Wilcoxon statistic as data points. The first time point yielding a significant t value in the expected direction can be taken as the LRP onset latency. 8 In our simulations, we examined the effectiveness of this method using t critical values with two-tailed significance levels of.05,.025, and Smid, Böcker, van Touw, Mulder, and Brunia ~1996! noted that the Wilcoxon test is usually found to be significantly different from zero for more than 200 ms ~p. 7! and thus defined LRP onset as the point at which such extended significance ~one-tailed! was obtained ~Mulder, personal communication, 1996!. We became aware of this variant of the Wilcoxon procedure too late to include it in the present simulations. The baseline deviation method was described and used by Osman et al. ~1992!. In brief, the stimulus-locked LRP waveform for each participant in each condition is examined to find the first time point at which the LRP begins to exceed consistently a criterion value set to 2.5 times the standard deviation of noise LRP, estimated from LRP fluctuations during the baseline period. Consistently exceed meant that the average LRP during each of the two 50-ms windows following the estimated LRP onset also had to exceed the criterion ~Osman, personal communication, 1995!. In our simulations, we also examined the effectiveness of this method with the criterion defined as 2.0 or 3.0 times the noise standard deviation. The third method we examined was a half-amplitude method carried out at the level of the individual participant data rather than the grand averages. With this method, the LRP for each participant in each condition is examined to find the moment at which it reaches half of its maximum amplitude, and this moment is taken as the LRP onset for that participant and condition. Relative to the proposed jackknife procedure, this method has the advantage that it can be used with traditional statistical tests ~e.g., t tests! because an onset is obtained for each participant in each condition. In our simulations, we also examined the effectiveness of this method with the criterion defined as 10, 30, 70, or 90% of the maximum amplitude. Two other problems arose when scoring the simulated data with the baseline deviation and half-amplitude methods because these methods yield a separate measure of onset latency for each participant. First, it is possible for a participant s data to satisfy the criterion at time zero ~i.e., the time of the stimulus onset!. Second, with the baseline deviation method, it is possible that a participant s data do not satisfy the criterion at any time point, leading to an undefined latency. Simulated participants with either of these problems were excluded from the sample when computing the overall results ~e.g., mean onset latency! for a simulated data set, as they would have to be in the analysis of real data sets. Table 1 also shows summary statistics for the baseline deviation, half-amplitude, and Wilcoxon methods computed from the same 2,000 simulated data sets used to estimate differences for the jackknife-based method. In most cases, these methods are also reasonably unbiased, but they produce higher standard deviations than the jackknife-based method, indicating that they are less accurate estimators of the true difference. To gain some intuition about why the jackknifing method works so well relative to the other methods, it is helpful to compare this method with the half-amplitude method carried out at the level of the individual-subject LRPs. When the jackknifing method is defined in relative terms ~e.g., 50% of maximum amplitude!, it differs from the half-amplitude method only with respect to the averaging done prior to latency determination. Using the halfamplitude method, LRPs are averaged and latencies determined for each participant separately, and then latencies are averaged across participants to overcome experimental noise. With the proposed new method, LRPs are averaged across participants before latencies are determined, so the experimental noise is overcome at an earlier stage of the analysis. The simulation results suggest that it is better to average out experimental noise earlier rather than later because the latency in the average, provided by the new method, is much more stable than the average of the latencies, provided by the old method ~cf. Table 1!. Intuitively, it is easy to see why this is true. If the LRP noise is large enough that the criterion can be crossed before the LRP has started, then the criterion level will be reached at an

7 LRP onset latency differences 105 almost entirely random latency. Even an average of these random latencies will not be a stable quantity. Conversely, if LRP noise is kept small enough that the criterion will not be crossed until LRP has started ~e.g., by averaging across participants!, then the noise will cause only small random variation in the moment of reaching the criterion level. In short, a small increase in LRP noise can produce a big increase in the variability of latency estimates, and the new method works well because it obtains latency estimates only when LRP noise is small. Hypothesis Testing Although the jackknife-based method clearly gives good estimates of latency differences on the average across experiments, an equally important question for researchers is whether the method can be used to make inferences from a single set of experimental results. Typically, for example, a researcher will want to test the null hypothesis that the true difference is zero, both for stimulus-locked and for response-locked onsets. Using the new method, the researcher would compute t J D s D, ~3! where D is the difference in stimulus- or response-locked latencies observed in the overall sample, and s D is the estimate of its standard error obtained from Equation 2. According to the null hypothesis, the quantity t J should have approximately a t distribution with N 1 degrees of freedom because D is approximately normal with a mean of zero and s D is an estimate of the standard deviation of D. Thus, the researcher would reject the null hypothesis if the observed value of t J exceeded the critical cutoff obtained from a standard t table. To see how well this procedure would work, we computed 4,000 values of t J from the 2,000 simulated experiments used to construct Table 1 ~one value of t J to test the null hypothesis of zero difference in stimulus-locked onsets and one to test the null hypothesis of zero difference in response-locked onsets!. We then tabulated the proportion of simulated experiments in which the resulting t J was significant by using significance cutoffs for p levels of.05 and.01. We conducted this exercise for each of the different scoring methods and criteria. Computation of t values was straightforward for the baseline deviation and half-amplitude methods because these provide difference estimates for each participant, thus allowing computation of the standard error of the difference by using the usual formula ~i.e., Equation 1!. As described previously, the Wilcoxon procedure does not supply any estimate of standard error. Thus, we also employed jackknifing with this procedure to obtain estimated standard errors by computing the estimated latency difference via this method for each of the subsamples of N 1 participants and applying Equation 2. Table 2 shows the most informative results. For each method, it shows the probability of correctly rejecting the null hypothesis ~i.e., power! separately for tests of a stimulus-locked effect in the simulation where this effect was present and for tests of a responselocked effect in the simulation having the response-locked effect. We also estimated Type I error probabilities by tabulating the proportion of times that a significant response-locked effect was obtained in the simulation where the stimulus-locked effect was present, and vice versa, but these are not shown because they were close to or smaller than the nominal values of.05 and.01 in all cases. Table 2. Power as a Function of Stimulus- Versus Response-Locked Effect, Scoring Method and Criterion, and Significance Level (.05 vs..01) Stimulus-locked Response-locked Method and criterion Absolute criteria ~mv! Relative criteria ~%! Wilcoxon ~critical p level for determining onset! Baseline deviation ~number of noise SDs! Half-amplitude ~% maximum amplitude! With appropriate criteria, the jackknife-based method once again does extremely well. The power of this method is quite large, even at the more stringent.01 significance level. In combination with the low Type I error probabilities, this finding suggests that researchers can be appropriately confident of reaching the correct conclusion when using this method to look for a 100-ms effect in an experiment of this size ~i.e., eight participants with 50 trials per participant! and with an amount of EEG noise comparable to that present in this data set. We will consider in a later section the question of how well the method does with smaller effects and with experiments of other sizes. In contrast, the baseline-deviation, half-amplitude, and Wilcoxon methods do not perform well. Clearly, they have much less power than the jackknife-based method under the conditions implemented in these simulations. Confidence Intervals Besides testing null hypotheses, experimenters often wish to compute confidence intervals for the sizes of their experimental effects. Using the present method, the bounds of a 95% confidence interval for a true difference could be constructed by using the formula bound D 6 t.05 s D, ~4! where D is the latency difference computed from the grand averages including the full sample, s D is the jackknife-based estimate of its standard error, and t.05 is the critical t value with N 1 degrees of freedom for the.05 alpha level.

8 106 J. Miller, T. Patterson, and R. Ulrich For each of the 2,000 simulated experiments discussed to this point, we computed 95% confidence intervals around the true mean by using each method with the same error terms described in the previous section. All methods produced confidence intervals with a high likelihood of containing the true value ~0 or 100 ms, depending on the effect simulated!, so in this sense all methods performed well. As shown in Table 3, however, the methods differed widely in the mean half-width of the computed confidence intervals ~averaging across simulated experiments!. The smallest half-widths, which reflect the most precise difference estimates, were obtained with relative criteria of 50% when estimating stimulus-locked onsets and of 90% when estimating responselocked onsets. That is, under experimental conditions comparable to those simulated here, these methods would on average allow an experimenter to estimate an onset latency difference to within 636 ms for stimulus-locked waveforms and to within 619 ms for response-locked ones. Clearly, these methods would also be quite powerful in detecting effects smaller than 100 ms, a point to which we will return in a subsequent section. 9 Generality of Application Because we have discussed only two specific simulations thus far, a salient additional question about the new method is whether it works well under a wide variety of circumstances. This section describes, albeit briefly, additional simulations designed to see how well the method works under various other conditions. In the interests of brevity, it is convenient to summarize the results of the additional simulations by using a single number to 9 Some readers may be puzzled by apparent inconsistencies that arise when comparing Tables 1 and 3. Specifically, the mean half-widths shown in Table 3 do not order the different methods and criteria in exactly the same fashion as the SD values shown in Table 1. For example, the mean half-width shown in Table 3 is 36 ms for estimating a stimulus-locked effect with the relative criterion of 50% ~eighth row and second column in the table!, whereas it is 44 ms for the relative criterion of 70%. This comparison of half-widths suggests that the 50% criterion is more accurate than the 70% criterion. However, the SD values shown in Table 1 for these same two criteria are 14.7 and 14.1 ms, respectively, suggesting that the 70% criterion is more accurate. The discrepancy between the half-widths in Table 3 and the SDs in Table 1 arises because the two tables use somewhat different measures of variability. A confidence interval is computed by using a standard error value estimated from a single simulated experiment with Equation 1 or Equation 2, as appropriate. Thus, the mean confidence interval half-width is sensitive to the mean of these individual-experiment standard error estimates, averaging across simulated experiments. In contrast, the SDs shown in Table 1 reflect variation in the estimated difference scores across simulated experiments. Discrepancies between these two measures of variability can arise when the standard error of a difference estimated from a single experiment is a biased estimate of the true variability of differences across experiments. In fact, such biases were present in the current simulation results. For all procedures, means of the single-experiment standard errors were slightly larger than the actual standard deviations of the difference scores across experiments. Moreover, the amount of overestimation varies slightly from one method and criterion to the next. With the 50% criterion, overestimation was slight: The mean of the individual-experiment standard errors was 15.25, which is only slightly larger than the actual standard error of Overestimation was larger with the 70% criterion: The mean of the standard errors was 18.65, which is substantially larger than the actual standard error of The greater overestimation of standard error with the 70% criterion than with the 50% criterion is responsible for the discrepancy. In summary, the 70% criterion provides a better point estimate of the difference, but the 50% criterion provides a better interval estimate because of its superior estimation of standard error. The same considerations apply in the case of other apparent discrepancies that can be found in comparing these two tables. Table 3. Mean Half-Widths of 95% Confidence Intervals Computed for Differences in Stimulus-Locked (SL) and Response-Locked (RL) Onsets as a Function of Simulated Effect and Scoring Method and Criterion Stimulus-locked Response-locked Method and criterion SL onset RL onset SL onset RL onset Absolute criteria ~mv! Relative criteria ~%! Wilcoxon ~critical p level for determining onset! Baseline deviation ~number of noise SDs! Half-amplitude ~% maximum amplitude! measure the ability of each method to discriminate between situations with true versus false null hypotheses under the conditions embodied in the simulation. One such measure of discriminability is the quantity d ' used in signal detection theory ~e.g., Green & Swets, 1966!. For each scoring method, an estimate of d ' can be computed from d ' XP 0 XP 1 %~S 2 0 S 2 1!02, ~5! where XP 0 and S 0 are the mean and standard deviation, respectively, of the difference estimates produced by that scoring method across 1,000 simulated experiments in which the null hypothesis is true, and XP 1 and S 1 are the corresponding mean and standard deviation, respectively, obtained from 1,000 simulated experiments in which the null hypothesis is false. Using values shown in Table 1, for example, it is possible to estimate the d ' with which the jackknifebased measure with a relative criterion of 50% discriminates between cases with stimulus-locked onset differences of 0 versus 100 ms: d ' 6.8. % ~ !02 In brief, d ' increases with the separation between the distributions of scores obtained with true versus false null hypotheses, so a

Measurement of ERP latency differences: A comparison of single-participant and jackknife-based scoring methods

Measurement of ERP latency differences: A comparison of single-participant and jackknife-based scoring methods Psychophysiology, 45 (28), 25 274. Blackwell Publishing Inc. Printed in the USA. Copyright r 27 Society for Psychophysiological Research DOI: 1.1111/j.1469-8986.27.618.x Measurement of ERP latency differences:

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE

CHAPTER THIRTEEN. Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE CHAPTER THIRTEEN Data Analysis and Interpretation: Part II.Tests of Statistical Significance and the Analysis Story CHAPTER OUTLINE OVERVIEW NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST) EXPERIMENTAL SENSITIVITY

More information

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior 1 Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior Gregory Francis Department of Psychological Sciences Purdue University gfrancis@purdue.edu

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

How to interpret results of metaanalysis

How to interpret results of metaanalysis How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

An EEG/ERP study of efficient versus inefficient visual search

An EEG/ERP study of efficient versus inefficient visual search An EEG/ERP study of efficient versus inefficient visual search Steven Phillips (steve@ni.aist.go.jp) Neuroscience Research Institute (AIST), Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568

More information

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods Psychological Methods 01, Vol. 6, No. 2, 161-1 Copyright 01 by the American Psychological Association, Inc. 82-989X/01/S.00 DOI:.37//82-989X.6.2.161 Meta-Analysis of Correlation Coefficients: A Monte Carlo

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - - Electrophysiological Measurements Psychophysical Measurements Three Approaches to Researching Audition physiology

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - Electrophysiological Measurements - Psychophysical Measurements 1 Three Approaches to Researching Audition physiology

More information

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce Stats 95 Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce pure gold knowing it was an element, though

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Basic Statistics and Data Analysis in Work psychology: Statistical Examples

Basic Statistics and Data Analysis in Work psychology: Statistical Examples Basic Statistics and Data Analysis in Work psychology: Statistical Examples WORK PSYCHOLOGY INTRODUCTION In this chapter we examine a topic which is given too little coverage in most texts of this kind,

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

The Effects of Temporal Preparation on Reaction Time

The Effects of Temporal Preparation on Reaction Time University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School January 2013 The Effects of Temporal Preparation on Reaction Time Glen Robert Forester University of South

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Basic Concepts in Research and DATA Analysis

Basic Concepts in Research and DATA Analysis Basic Concepts in Research and DATA Analysis 1 Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...2 The Research Question...3 The Hypothesis...3 Defining the

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

To open a CMA file > Download and Save file Start CMA Open file from within CMA

To open a CMA file > Download and Save file Start CMA Open file from within CMA Example name Effect size Analysis type Level Tamiflu Symptom relief Mean difference (Hours to relief) Basic Basic Reference Cochrane Figure 4 Synopsis We have a series of studies that evaluated the effect

More information

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris Running Head: ADVERSE IMPACT Significance Tests and Confidence Intervals for the Adverse Impact Ratio Scott B. Morris Illinois Institute of Technology Russell Lobsenz Federal Bureau of Investigation Adverse

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores

Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores ANALYZING PRE-POST CHANGE 1 Comparing Pre-Post Change Across Groups: Guidelines for Choosing between Difference Scores, ANCOVA, and Residual Change Scores Megan A. Jennings & Robert A. Cribbie Quantitative

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Short title: Likelihood-based early stopping design in single arm phase II studies Elizabeth Garrett-Mayer,

More information

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices

Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Author's response to reviews Title: Healthy snacks at the checkout counter: A lab and field study on the impact of shelf arrangement and assortment structure on consumer choices Authors: Ellen van Kleef

More information

Supporting Information

Supporting Information 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Supporting Information Variances and biases of absolute distributions were larger in the 2-line

More information

LAB 1: MOTOR LEARNING & DEVELOPMENT REACTION TIME AND MEASUREMENT OF SKILLED PERFORMANCE. Name: Score:

LAB 1: MOTOR LEARNING & DEVELOPMENT REACTION TIME AND MEASUREMENT OF SKILLED PERFORMANCE. Name: Score: LAB 1: MOTOR LEARNING & DEVELOPMENT REACTION TIME AND MEASUREMENT OF SKILLED PERFORMANCE Name: Score: Part I: Reaction Time Environments Introduction: Reaction time is a measure of how long it takes a

More information

Analysis of data in within subjects designs. Analysis of data in between-subjects designs

Analysis of data in within subjects designs. Analysis of data in between-subjects designs Gavin-Ch-06.qxd 11/21/2007 2:30 PM Page 103 CHAPTER 6 SIMPLE EXPERIMENTAL DESIGNS: BEING WATCHED Contents Who is watching you? The analysis of data from experiments with two conditions The test Experiments

More information

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Sum of Neurally Distinct Stimulus- and Task-Related Components. SUPPLEMENTARY MATERIAL for Cardoso et al. 22 The Neuroimaging Signal is a Linear Sum of Neurally Distinct Stimulus- and Task-Related Components. : Appendix: Homogeneous Linear ( Null ) and Modified Linear

More information

MODULE S1 DESCRIPTIVE STATISTICS

MODULE S1 DESCRIPTIVE STATISTICS MODULE S1 DESCRIPTIVE STATISTICS All educators are involved in research and statistics to a degree. For this reason all educators should have a practical understanding of research design. Even if an educator

More information

Detection Theory: Sensitivity and Response Bias

Detection Theory: Sensitivity and Response Bias Detection Theory: Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Statistical reports Regression, 2010

Statistical reports Regression, 2010 Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that

More information

Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison

Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Munsell Color Science Laboratory, Chester F. Carlson Center for Imaging Science Rochester Institute

More information

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved

More information

Pooling Subjective Confidence Intervals

Pooling Subjective Confidence Intervals Spring, 1999 1 Administrative Things Pooling Subjective Confidence Intervals Assignment 7 due Friday You should consider only two indices, the S&P and the Nikkei. Sorry for causing the confusion. Reading

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO:

CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO: 3 Chapter 8 Introducing Inferential Statistics CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO: Explain the difference between descriptive and inferential statistics. Define the central limit theorem and

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Statistics for Psychology

Statistics for Psychology Statistics for Psychology SIXTH EDITION CHAPTER 3 Some Key Ingredients for Inferential Statistics Some Key Ingredients for Inferential Statistics Psychologists conduct research to test a theoretical principle

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics Psy201 Module 3 Study and Assignment Guide Using Excel to Calculate Descriptive and Inferential Statistics What is Excel? Excel is a spreadsheet program that allows one to enter numerical values or data

More information

Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart

Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart Other Methodology Articles Ambiguous Data Result in Ambiguous Conclusions: A Reply to Charles T. Tart J. E. KENNEDY 1 (Original publication and copyright: Journal of the American Society for Psychical

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs).

Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Issues Surrounding the Normalization and Standardisation of Skin Conductance Responses (SCRs). Jason J. Braithwaite {Behavioural Brain Sciences Centre, School of Psychology, University of Birmingham, UK}

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

THE EFFECT OF A REMINDER STIMULUS ON THE DECISION STRATEGY ADOPTED IN THE TWO-ALTERNATIVE FORCED-CHOICE PROCEDURE.

THE EFFECT OF A REMINDER STIMULUS ON THE DECISION STRATEGY ADOPTED IN THE TWO-ALTERNATIVE FORCED-CHOICE PROCEDURE. THE EFFECT OF A REMINDER STIMULUS ON THE DECISION STRATEGY ADOPTED IN THE TWO-ALTERNATIVE FORCED-CHOICE PROCEDURE. Michael J. Hautus, Daniel Shepherd, Mei Peng, Rebecca Philips and Veema Lodhia Department

More information

Post Hoc Analysis Decisions Drive the Reported Reading Time Effects in Hackl, Koster-Hale & Varvoutis (2012)

Post Hoc Analysis Decisions Drive the Reported Reading Time Effects in Hackl, Koster-Hale & Varvoutis (2012) Journal of Semantics, 2017, 1 8 doi: 10.1093/jos/ffx001 Article Post Hoc Analysis Decisions Drive the Reported Reading Time Effects in Hackl, Koster-Hale & Varvoutis (2012) Edward Gibson Department of

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

RAG Rating Indicator Values

RAG Rating Indicator Values Technical Guide RAG Rating Indicator Values Introduction This document sets out Public Health England s standard approach to the use of RAG ratings for indicator values in relation to comparator or benchmark

More information

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick.

Running head: INDIVIDUAL DIFFERENCES 1. Why to treat subjects as fixed effects. James S. Adelman. University of Warwick. Running head: INDIVIDUAL DIFFERENCES 1 Why to treat subjects as fixed effects James S. Adelman University of Warwick Zachary Estes Bocconi University Corresponding Author: James S. Adelman Department of

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra EARTHQUAKE ENGINEERING & STRUCTURAL DYNAMICS Published online 9 May 203 in Wiley Online Library (wileyonlinelibrary.com)..2303 Conditional spectrum-based ground motion selection. Part II: Intensity-based

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) Page 1 of 14 European Federation of Statisticians in the Pharmaceutical Industry (EFSPI) COMMENTS ON DRAFT FDA Guidance for Industry - Non-Inferiority Clinical Trials Rapporteur: Bernhard Huitfeldt (bernhard.huitfeldt@astrazeneca.com)

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

The lateralized readiness potential and response kinetics in response-time tasks

The lateralized readiness potential and response kinetics in response-time tasks Psychophysiology, 38 ~2001!, 777 786. Cambridge University Press. Printed in the USA. Copyright 2001 Society for Psychophysiological Research The lateralized readiness potential and response kinetics in

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Published in Education 3-13, 29 (3) pp. 17-21 (2001) Introduction No measuring instrument is perfect. If we use a thermometer

More information

REPRODUCTIVE ENDOCRINOLOGY

REPRODUCTIVE ENDOCRINOLOGY FERTILITY AND STERILITY VOL. 74, NO. 2, AUGUST 2000 Copyright 2000 American Society for Reproductive Medicine Published by Elsevier Science Inc. Printed on acid-free paper in U.S.A. REPRODUCTIVE ENDOCRINOLOGY

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

Comparative efficacy or effectiveness studies frequently

Comparative efficacy or effectiveness studies frequently Economics, Education, and Policy Section Editor: Franklin Dexter STATISTICAL GRAND ROUNDS Joint Hypothesis Testing and Gatekeeping Procedures for Studies with Multiple Endpoints Edward J. Mascha, PhD,*

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d PSYCHOLOGY 300B (A01) Assignment 3 January 4, 019 σ M = σ N z = M µ σ M d = M 1 M s p d = µ 1 µ 0 σ M = µ +σ M (z) Independent-samples t test One-sample t test n = δ δ = d n d d = µ 1 µ σ δ = d n n = δ

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

BASIC CONCEPTS IN RESEARCH AND DATA ANALYSIS

BASIC CONCEPTS IN RESEARCH AND DATA ANALYSIS 1 Chapter 1 BASIC CONCEPTS IN RESEARCH AND DATA ANALYSIS Introduction: A Common Language for Researchers... 2 Steps to Follow when Conducting Research... 2 The research question... 3 The hypothesis...

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Goodness of Pattern and Pattern Uncertainty 1

Goodness of Pattern and Pattern Uncertainty 1 J'OURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR 2, 446-452 (1963) Goodness of Pattern and Pattern Uncertainty 1 A visual configuration, or pattern, has qualities over and above those which can be specified

More information

Designs. February 17, 2010 Pedro Wolf

Designs. February 17, 2010 Pedro Wolf Designs February 17, 2010 Pedro Wolf Today Sampling Correlational Design Experimental Designs Quasi-experimental Design Mixed Designs Multifactioral Design Sampling Overview Sample- A subset of a population

More information

METHODS FOR DETECTING CERVICAL CANCER

METHODS FOR DETECTING CERVICAL CANCER Chapter III METHODS FOR DETECTING CERVICAL CANCER 3.1 INTRODUCTION The successful detection of cervical cancer in a variety of tissues has been reported by many researchers and baseline figures for the

More information

A Comparison of Three Measures of the Association Between a Feature and a Concept

A Comparison of Three Measures of the Association Between a Feature and a Concept A Comparison of Three Measures of the Association Between a Feature and a Concept Matthew D. Zeigenfuse (mzeigenf@msu.edu) Department of Psychology, Michigan State University East Lansing, MI 48823 USA

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Appendix III Individual-level analysis

Appendix III Individual-level analysis Appendix III Individual-level analysis Our user-friendly experimental interface makes it possible to present each subject with many choices in the course of a single experiment, yielding a rich individual-level

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

A likelihood ratio test for mixture effects

A likelihood ratio test for mixture effects Journal Behavior Research Methods 26,?? 38 (?), (1),???-??? 92-16 A likelihood ratio test for mixture effects JEFF MILLER University of Otago, Dunedin, New Zealand Under certain circumstances, it is theoretically

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

Are Retrievals from Long-Term Memory Interruptible?

Are Retrievals from Long-Term Memory Interruptible? Are Retrievals from Long-Term Memory Interruptible? Michael D. Byrne byrne@acm.org Department of Psychology Rice University Houston, TX 77251 Abstract Many simple performance parameters about human memory

More information

Bayesian Analysis by Simulation

Bayesian Analysis by Simulation 408 Resampling: The New Statistics CHAPTER 25 Bayesian Analysis by Simulation Simple Decision Problems Fundamental Problems In Statistical Practice Problems Based On Normal And Other Distributions Conclusion

More information

Hierarchy of Statistical Goals

Hierarchy of Statistical Goals Hierarchy of Statistical Goals Ideal goal of scientific study: Deterministic results Determine the exact value of a ment or population parameter Prediction: What will the value of a future observation

More information

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West

More information