Predictive Bias Correction for Sequential. Quantitative Visual Assessments

Size: px
Start display at page:

Download "Predictive Bias Correction for Sequential. Quantitative Visual Assessments"

Transcription

1 Predictive Bias Correction for Sequential Quantitative Visual Assessments 1

2 Aletta Nonyane Department of Primary Care and General Practice, University of Birmingham, Edgbaston, Birmingham, England, B15 2TT, U.K. Chris Theobald School of Mathematics, University of Edinburgh and Biomathematics and Statistics Scotland, The King s Buildings, Edinburgh, Scotland EH9 3JZ, U.K. cmt@bioss.ac.uk and Alec Mann Biomathematics and Statistics Scotland, The King s Buildings, Edinburgh, Scotland EH9 3JZ, U.K. alec@bioss.ac.uk January 6, 2005 Abstract Bayesian predictive calibration is used here to correct visual assessment scores which are prone to subjective bias. It is applied to correct for bias in a long 2

3 sequence of responses which are subject to carry-over and order effects, as well as autocorrelation. The method is illustrated in the context of experiments in which the scoring of crop leaves for disease severity is simulated. This correction is done under the assumption of a parametric regression model which relates the scores to the true values and some covariates. Assessors precision in the scoring tasks is compared using a criterion based on the Shannon information contained in the predictive distribution of future scores. Key Words: Bayesian inference; Predictive distribution; Visual assessment. 1 Introduction Despite advances in automated image analysis, there are many quantitative visual perception tasks in agricultural and medical research for which trained human assessors are required. Examples of this are seen in plant breeding, where the susceptibility of large numbers of rival varieties to diseases such as mildew and leaf rust is assessed by recording the percentage of leaf area covered by lesions, and the stems of winter cereals may be assessed for damage from eyespot. Similar assessments are required for epidemiological and fungicide studies of plants, for estimating the proportions of quadrants covered by foliage, for measuring needle loss from evergreen trees caused by acid rainfall, and for scoring carcass fat depth in breeding experiments on meat animals. In other contexts, assessments may be required of the numbers of dots appearing in a defined area, for example plant bugs such as aphids on a leaf, and microorganisms on a plate. Human judgements are always affected by the context within which they are made. Visual assessments carried out by humans are subject to bias from various sources: these include the order of the stimulus in a sequence and carry-over, the tendency for the 3

4 judgement of one stimulus to be influenced by the previous one. For example, Parker et al. (1995) found that visual estimates of wheat disease severity were substantially biased and varied considerably over short time-scales: visual assessment errors were large enough to alter the conclusions of a comparison of fungicide treatments. In the context of measuring fat in the carcasses of meat animals, Fisher (1990) argued that differences between operators and between periods of assessment are causes for concern even when photographic standards are used. Morris and Rule (1988) found that for estimation of line length and numerosity the average score declined over a sequence of images, and Sawyer and Wesenstein (1994) and Ferris et al. (2001) reported evidence of carryover in judgements of numerosity and percentage cover. The effect of carry-over is said to be assimilative when the current score is biased towards the previous stimulus and contrastive when the score is biased away from the previous stimulus (Cross (1973)): the direction of any bias could, though, depend on the magnitude of the current stimulus. DeCarlo and Cross (1990) discussed different regression models for sequential effects when assessors are asked to give their estimates of stimulus intensity. Most recently, Ferris et al. (2001) discussed the carry-over effects seen in a visual assessment experiment similar to the one being studied here. They proposed modelling carry-over as a logistic function of the difference in stimulus intensity. Automated systems for quantitative image measurement should not suffer from order and carry-over effects, but may not be practicable as alternatives to human judgement. In the context of crop disease assessment, where rapid measurement is required, objective assessment methods may be time consuming, expensive or impractical for field use, and may not permit repeated inspection of the same plants (Newton and Hackett (1994)). Where there is no adequate alternative to visual assessment, two complementary approaches to 4

5 reducing the bias of human assessors may be pursued. (a) Assessors may be trained to improve their performance. For example, the computerbased training program DISTRAIN has been developed by Tomerlin and Howell (1988) to train assessors estimating disease severity on cereals: leaf images showing random amounts of damage are displayed on a computer monitor and feedback is provided about the true percentage cover after each response is entered. (b) If individual biases can be treated as fairly consistent within the period in which assessments are required and if a sufficiently simple model can be fitted to them, biases can be largely removed by testing the assessors in circumstances which closely mimic the field situation: responses to sequences of images with known stimulus values can be used to estimate biases and to correct future responses by calibration. It may be advantageous to incorporate carry-over effects into the calibration, possibly together with autocorrelation in the responses. We concentrate in this paper on the second method, assuming that a parametric regression model which includes order and carry-over effects can be found to relate each assessor s responses to the true stimulus values. The assessors performance is likely to change as they become more experienced, so calibration would need to be repeated over time. In Section 2 of this paper we describe experiments carried out for two purposes 1. to investigate the biases, including order and carry-over effects, of human assessors in tasks which mimic subjective scoring of crop disease severity 2. to illustrate the adjustment of assessors responses in a future task of the same type using Bayesian prediction. 5

6 The design used, as well as the data obtained and the model fitted, are also discussed in this section. Section 3 considers calibration for the two tasks. Section 4 discusses the evaluation of the assessors precision, based on their performance in the calibration experiment. This precision is measured by the amount of Shannon information contained in the predictive distribution of future scores. The overall discussion is given in Section 5. 2 The Visual Assessment Experiments Twenty five undergraduate students aged between 18 and 24 and following statistics modules took part in experiments intended to resemble the visual scoring of disease severity. They were unpaid, but prizes were offered for the three most accurate sets of responses. The experiments were developed from those of Ferris et al. (2001), and comprised two tasks, which we refer to below as the cover and count tasks. Examples of images used are given in Figures 1 and 2. In the cover task, images comprising possibly overlapping black circular disks of different sizes within a white square were displayed on a computer monitor, and subjects were required to give estimates of the percentages of the square which were covered by the disks in a sequence of images. Each image was displayed for 6 seconds above a slider for recording the assessor s responses. This was graduated from 0 to 100 and controlled by a computer mouse. Before recording began, 9 images were displayed for 6 seconds each with stimuli in the same range as those used for the task itself. Subjects were allowed to enter a response during these intervals, after which the true percentage cover was shown until the subject pressed the return key to view the next one. No feedback was given once recording had begun. 6

7 The count task was similar, except that the images comprised disjoint black circular disks of the same size, and estimates were required of the numbers of disks. The slider was graduated from 0 to 250, but to avoid the impression of an upper limit it was extended to the edge of the screen, at about 300. The order of the two tasks was chosen at random for each assessor, and they were allowed an unlimited break between them: the average break lasted 3 minutes. The images in the cover task were generated before the experiment was performed by choosing points at random in a square of pixels to be centres of circular disks with radii taken from a uniform distribution on the interval (5, 13). For the count task, the disks had common radius 10 pixels, and points were rejected if disks centred on them would overlap a side of the square or a previously generated disk. 2.1 Design of the image sequences The sequence of images presented to each individual may be regarded as forming a changeover design with an unusually large number of periods. Since our interest was to investigate the effects of the current and previous images and the effect of order, the magnitudes of the stimuli in these sequences were chosen to be approximately balanced for these effects. One way to achieve approximate balance would be to select a range of values for the amounts of cover or the counts, choose a probability distribution over this range, and sample randomly from this distribution for successive images: this method was used, for example, by DeCarlo and Cross (1990) for magnitude estimation of sound stimuli. However, to achieve more exact balance we chose to define nominal stimulus levels corresponding to disjoint intervals within the selected ranges. With m distinct levels, we seek to define a sequentially balanced sequence of m levels comprising a single, leading 7

8 level followed by m blocks, each block containing a permutation of the m levels. Sequential balance means that all of the m 2 ordered pairs of levels occur once each. The following is an example of such a sequence with 7 nominal levels denoted by the letters A to G and blocks separated by spaces. A A B C D E F G G A D C F B E E G F D B A C C G B F A E D D F E A G C B B D G E C A F F C E B G D A Notice that the m self-adjacencies must occur at the beginning of the sequence and at the boundaries of the blocks, and that the use of the blocks of permutations provides approximate balance between the order and both the current and the previous stimulus effects. The response to the first image may be ignored in the analysis since it includes no carry-over effect. This type of design was proposed by Finney and Outhwaite (1956) for bioassay studies with several treatments. They observed that no such sequences are available for m equal to 3, 4 and 5. Sampford (1957) gave examples with m from 6 to 11, but provided no theorems on existence and no general methods of construction. For m equal to 6 and 7, we have systematically enumerated 324 and 175,588 sequences respectively with the first block written in natural order: our methods and results will be reported elsewhere. In principle, one might choose a sequence at random from all those available: we merely selected at random from the four then known to us with 7 letters. To generate a sequence of images from a given sequence of levels, the m nominal levels were allocated at random to m intervals for the appropriate type of stimulus, and images were chosen with stimulus values in the corresponding intervals: no images were presented to a subject more than once. For our experiment, m was taken to be 7, giving a sequence of 50 images. In the cover task, the majority of images were shown with percentage cover less than 50 to reflect 8

9 the values likely to be of interest in crop disease assessment: the intervals used were 5 ± 0.5, 10 ± 1.0, 17 ± 1.7, 26 ± 2.6, 37 ± 3.7, 50 ± 5.0, 65 ± 6.5. The intervals used for count similarly included values within one tenth of 7 central values, these values being 17, 27, 40, 60, 90, 135, 200. The smallest central value was taken as 17 since lower numbers of disks might be counted individually rather than estimated. For each task two sequences of 50 images each were displayed with no break between them. For convenience, the leading image in the second sequence was included, although this made the overall sequence slightly unbalanced. Thus each combination of assessor and task resulted in 100 responses, including any missing ones. The first response is ignored, and those analysed are numbered from 1 to Data from the two tasks The top-left plot in Figure 3 shows the responses of all the assessors in the cover task against the true amounts of cover, along with the line of equality. This plot suggests that there is some upward bias at low amounts of cover, and shows a few outlying observations. As might be expected with proportions, there is an initial increase in variability with the true percentage cover. The bottom-left plot in this figure shows responses from these subjects in the count task. They show both variability and downward bias increasing with true count. Logarithmic transformations are commonly applied to magnitude-scaling stimuli and responses (De Carlo and Cross (1990)), sometimes in the hope that response is roughly proportional to a power of stimulus intensity. This transformation is not appropriate for percentage cover because we can expect the responses to become more accurate when the true cover approaches 100%: a logit transformation of both axes seems more natural. 9

10 The top-right plot in Figure 3 shows the same data after this transformation: it seems to have been reasonably successful in stabilizing the variance and achieving linearity, so we analyse the cover data on the logit scale. The log-transformed data from the count task in the bottom-right plot shows roughly constant variance, so we use this scale for analysing the counts. In the context of examining leaves for disease, we might expect to find some with no lesions, corresponding to a logit response of. Any linear adjustment on this scale would leave this value unchanged, so that assessors are assumed to identify undamaged leaves correctly, and a correction would be applied only for leaves given positive responses. To assess the validity of the linearity and equal-variance assumptions for each task, residuals were examined from the regression of transformed response on sequence order and transformed current and previous stimuli for each assessor. For the cover task, the residuals tended to be positive when the logit of the current stimulus was positive, that is when cover exceeded 50%. For the count task, the residuals for several assessors showed slight curvature when plotted against current stimulus. For both tasks, the effect of order appeared to be slightly non-linear for a few assessors, but there was no dependence on previous stimulus: the occasional gross error occurred, but there was no consistent non-normality. From now on, we take stimulus and response to refer to the data after any transformation intended to induce linearity and homoscedasticity in the relationship. 10

11 2.3 Statistical model We consider predictive calibration under a homoscedastic normal linear regression model which includes carry-over, order effects and auto-correlation, allowing the expectation µ t of the tth response, y t say, to be a linear function of the current stimulus x t, the previous stimulus x t 1 and t itself. The latter effect might arise from fatigue or be a learning effect. The expected responses for a particular subject may therefore be expressed as µ t = E(y t x t, x t 1, t, β) = β 0 +β 1 (x t x)+β 2 (x t 1 x)+β 3 (t t), (t = 1, 2,..., n) (1) where β = [ β 0 β 1 β 2 β 3 ] T is a vector of unknown regression coefficients and t indexes the n responses following the initial one; x is the mean stimulus level and t is N+1 2. The intercept is assumed to be the overall expected level of transformed cover or count, and to minimise its correlation with other coefficients, the predictors are centred about their means. Lastly, the responses are assumed to follow an AR(1) process defined by y t µ t = ρ (y t 1 µ t 1 ) + u t (t = 2, 3,..., n), (2) where ρ is the autocorrelation parameter and the u t are taken to be independent and N(0, σ 2 ). Using the Markov property of this process, the joint distribution of y 1,..., y n is expressible as the product of the marginal distribution of y 1, N(µ 1, σ 2 /(1 ρ 2 )), and the conditional distributions N(µ t + ρ(y t 1 µ t 1 ), σ 2 ) of the y t given the previous responses (t = 2, 3,..., n). This model will be henceforth referred to as Model 1. Fitting the above model to the data from the 25 assessors showed all parameters to be significant for both tasks (that is, p-value p < 0.001), except for the carry-over effect in the cover task which was nonsignificant for all assessors. The correlation between β 0 and other coefficients has been reduced by centring the independent variables about their 11

12 respective. This then makes it possible to assume that these coefficient are independent when specifying their priors for the Bayesian predictive calibration. 3 Predictive Calibration with Carryover and Covariates Aitchison and Dunsmore (1975) and Aitchison (1977) make a persuasive case that statistical calibration should be carried out within a Bayesian predictive framework. Aitchison and Dunsmore (1975) consider calibration only for a single future response, but the inclusion of carry-over effects and autocorrelation in our regression of responses on stimuli requires that calibration be performed simultaneously for a sequence of future responses. We therefore generalize the predictive calibration procedure of Aitchison and Dunsmore (1975) to allow for carry-over, for covariates and also for correlated responses, as in (1) and (2), and illustrate the use of Markov chain Monte Carlo to apply the more general calibration procedure to sequences of human visual assessments. In the following, probability density functions are denoted by p, and the argument of p indicates the random variable being considered. The predictive method assumes that we have data from a calibration experiment on an assessor in which a response is recorded at selected values of a stimulus. Let x and y denote the vectors of stimuli and responses for the assessor. We also want to allow the distribution of y to depend on the sequence order of the stimulus, so let t denote a vector defining this order. The regression model specifies the probability density function p(y x, t, θ) of y given x, t and a vector θ of unknown parameters. Given a prior density p(θ) for θ, the posterior density p(θ x, t, y) of θ is proportional to p(θ) p(y x, t, θ). We 12

13 then assume vectors y and t to be recorded for the same assessor in order to estimate the unknown vector of stimuli x. The vectors t and t could be generalized to include any other covariates whose values are to be specified in the calibration experiment and in the future. The predictive density function of y, is then given by p(y x, t, y, x, t ) = p(y x, t, θ) p(θ x, t, y) dθ. (3) Aitchison and Dunsmore (1975) distinguish designed and natural calibration experiments. In the former, x is chosen to cover whatever range of stimuli is of interest; in the latter, x is assumed to be drawn from the same population as x, so that x provides information on the prior distribution of x. Ours is a designed calibration experiment, so it is necessary to specify a prior density p(x ) for x which is not dependent on x or θ. In the context of assessing disease lesions on leaves, this prior might be influenced by the amounts of such disease recorded in previous seasons and an overall impression of the extent of disease in the current season. For fairness, it should not depend on the crop variety. With a designed experiment, inferences about x are based on the calibrative density p(x x, t, y, t, y ), which may be derived as follows from the joint density of x and θ given x, t, y, t and y. Assuming that x and θ are independent a priori, and that y and y are independent given x, t, x, t and θ, we have p(x, θ x, t, y, t, y ) p(x, θ) p(y, x, y t, t, x, θ) p(x ) p(θ) p(y x, t, θ) p(y x, t, θ) p(x ) p(y x, t, θ) p(θ x, t, y), where the proportionality is in x and θ. Integrating with respect to θ and using (3) gives p(x x, t, y, t, y ) p(x ) p(y x, t, y, x, t ). (4) 13

14 Expressions (3) and (4) represent a generalization of the calibrative method for designed experiments described on page 189 of Aitchison and Dunsmore (1975) to include a covariate and vector (rather than scalar) future stimuli and responses. Although this extension is straightforward in theory, iterative methods are likely to be required in practice for calculating the calibrative distribution of x except in special cases. We follow the modern practice of representing our model by a directed acyclic graph whose nodes correspond to the data and the model parameters, as discussed in Gilks et al. (1996). 3.1 Implementation The WinBUGS program (Spiegelhalter et al. (2003)) can be used to fit the autoregressive model above to the data provided by each assessor, and thus approximate the posterior distribution of β 0, β 1, β 2, β 3, σ and ρ for that individual. The program, which is freely available from allows models to be specified graphically. The user defines the conditional distribution for each stochastic node given the values of its parents, and the program selects and implements an appropriate Markov chain Monte Carlo method. The output includes convergence diagnostics, and summary statistics and kernel density estimates of the posterior probability density functions for any node. The graphical representation of the model can be extended to include nodes for future response vectors y recorded on the same individual and for the corresponding stimuli x. Unknown future stimuli are treated as parameters, and the calibrative distribution of any element of x is its marginal posterior distribution. 14

15 3.2 Predictive calibration for the two visual tasks To illustrate predictive calibration for correcting the bias in an individual s assessments, we consider data recorded by one assessor who completed the two tasks on two occasions one week apart. The values of the stimuli are known for both occasions, but those given on the second occasion are treated as unknown for the predictive analysis, so that we can examine the nature and magnitude of any improvement arising from the calibration. The area of application influences the choice of the prior distribution for x and the range of stimuli for a designed calibration experiment. Prior distributions for severity of plant disease may be determined by the plant pathologist observation of disease severity during the current and previous season. Here the Normal prior was based on the calibration experiment, with mean equal to that of x in the experiment, but with twice the standard deviation to allow for the possibility of higher variability in future scores. Normal prior distributions were assumed for the coefficients β 0, β 1, β 2 and β 3. Two possibilities for the values of their prior expectations were considered. First, it may be assumed a priori, for Model 1, that the assessors expected bias, carry-over effects and order effects are zero. The prior expectations for β 0, β 1, β 2 and β 3 would then by x, 1, 0 and 0, respectively. This will be referred to as Prior 1. Another option would be to base prior expectations and variances of the coefficients on their estimates after fitting Model 1 to the data from the experiment described above. These priors are shown in Table 1 and will be referred to as Prior 2. In both cases, the correlation parameter, ρ, has a beta prior, centred around 0 because very little autocorrelation was exhibited from fitting Model 1. It is assumed that, like in the design of the experiment, x 0 comes from the same stimulus interval as x 1. The prior distribution for σ 2 was taken to be inverse gamma, and specified by assuming ds 2 σ 2 has a distribution χ 2 (d), where s 2 is a prior 15

16 estimate of σ 2 and the degrees of freedom d reflect the precision of the estimate. The values of these are taken from the analysis of data from the above experiment. The calibration method is illustrated here for 5 assessors who did not take part in the experiment above, but carried out each task on two occasions for the purpose of calibration. The first two plots in the upper row of Figure 4 show the logits of the true and recorded cover for one of the assessors (labelled C later) on the first and second occasions; The third plot shows the posterior expected logit cover against logit true cover on the second occasion. The lower row of Figure 4 shows the corresponding plots for the count task. Note that posterior expected stimuli may be calculated for missing responses, but that these are omitted from the plots. One measure of effectiveness of the calibration method for any assessor is the ratio of the mean square errors of their recorded and calibrated responses. These mean square errors are given in Tables 2 and 3: The values under the heading Model 2 are defined later. Moreover, if calibration is effective, we expect to see closer agreement with the line of equality in the third plot in each row than in the second. This is more true in the plots of the cover task than it is for the count task. These plots illustrate one drawback of this method, and that is the methods reliance on the assumption of consistent bias between the first and second performances. When this assumption does not hold, as in the count task where the plots show more bias in the second performance than in the first, calibration does not improve the scores of the second performance as much. In a similar way, for the cover task, the first occasion shows both over- and under-estimation and the second one has under-estimation only. Hence, although there is a lot more correction in this task, there is some over-compensation as well. 16

17 The calibration procedure was tested for robustness to changes in the prior distribution of the future stimulus level x. It showed no robustness to this change for both tasks. Particulary, when the prior variance of x was halved, the calibrated responses which are the posterior expected stimuli, were drawn towards the mean, hence introducing bias at the extreme levels of the stimulus. Another test of the calibration procedure was done by changing the regression model assumed for calibration. The carry-over from the previous stimulus level and order effects were removed, assuming that errors were correlated, following an AR(1) model. This resulted in the model with expectation: µ t = β 0 + β 1 (x t x), (5) hence y t = µ t + ε t (6) with ε t = ρε t 1 + u t. (7) This model, referred to as Model 2 henceforth, was fitted using Prior 1 because this prior was seen to perform better than Prior 2. The mean square errors under this model are given in the last columns of Tables 2 and 3. The result of this change in the model reflects what was shown by the analysis of variance for the two tasks in the experiment. For the count task, analysis of variance showed carry-over from the previous image level to be significant. Thus, calibration under a model without this carry-over term (Model 2) does not improve the responses as well as calibration under the model with the carry-over term. Hence the mean square errors of calibrated responses under Model 2 are higher 17

18 than those under Model 1. On the other hand, for the cover task, carry-over from the previous image level was not significant, as shown by the analysis of variance results. Thus, calibration under a model without this carry-over term (Model 2) improves the responses (for 4 out of 5 assessors) more than calibration under the model with the carry-over term. Hence the mean square errors of calibrated responses under Model 2 are lower than those under Model 1. 4 Assessing the Assessors A researcher who has to address a quantitative visual assessment task of the sort referred to in the Introduction might recruit some potential assessors, instruct them using examples of the type of assessment required, test the accuracy of each candidate on a set of objects or images with carefully measured stimulus values and then offer employment to the candidates whose responses appeared to be most precise. In a frequentist comparison of the accuracy of the assessors results, one might compare the mean square errors of their responses relative to the stimulus values: if linear bias correction is allowed, the accuracy of assessors might then be compared using their coefficients of determination, R 2. For a Bayesian analysis, the quality of an assessor could be measured with reference to some notional future sequence of stimuli x. As in Spezzaferri (1985), we consider the expected gain in Shannon information about x arising from the calibration experiment. This expected gain compares the probability density of x conditional on x, y and y with the unconditional (prior) density: it depends on the number of stimuli, and is given 18

19 by { } p(x x, y, y ) p(y x, y) p(x x, y, y ) ln dx dy. (8) p(x ) It is shown in the Appendix that for a designed calibration experiment, expression (8) may be approximated by [ 1 ln det{v(y x, y)} 2 ] p(x ) ln det{v(y x, y, x )} dx, (9) where V denotes a variance matrix. For a scalar future response, (9) becomes the expectation, over the prior distribution of x, of 1 2 ln{v(y x, y)/ V(y x, y, x )}. Large values indicate that responses could be predicted accurately from future stimulus values. The numbers of notional future stimuli and responses to be considered might reflect the number of assessments typically made in a sequence. Potentially, the ranking of the assessors depends on this choice, but for simplicity we consider a sequence comprising two stimuli x 0 and x 1, say, and the corresponding responses y 0 and y 1, the first of which is ignored as before. Again, we take x 0 and x 1 to be independent and from a common prior distribution with density p(x ). The flexibility of defining this prior distribution is what makes the Bayesian approach more desirable than the use of R 2, for example, when high precision in a particular interval of the stimulus scale is emphasized. The evaluation of the second term in (9) using Markov chain Monte Carlo appears to require the generation of responses for every combination of values of x 0 and x 1. It is therefore convenient to approximate the prior distribution for x by a discrete distribution: we may calculate the r quantiles of p(x ) for some positive integer r and give equal probability to each of them. Then values of y 1 may be generated from the posterior predictive distribution for each of the r 2 combinations of (x 0, x 1 ), V(y 1 x, y, x 0, x 1 ) may be estimated from the set of generated values for each combination, and ln V(y 1 x, y, x 0, x 1 ) 19

20 averaged over these combinations. The variance V(y x, y) in the first term of (9) may be estimated from the combined set of values of y 1. The ranking of assessors according to the information criterion in (9) is illustrated here for the count task carried out by the first 25 assessors. The plots of the individual transformed data sets are given in Figure 6 in order to see the agreement between the information criterion and the level of agreement between responses and true counts. As mentioned earlier, R 2 provides a frequentist measure of assessor performance: this is plotted against the values of the information criterion (9) in Figure 5, showing a good agreement in ranking assessors. They both, in turn, agree with the plots of observed individual data in Figure 6. As an example, the responses of assessor 19 agree well with the true values and this assessor is ranked as the most accurate by the two criteria. Assessors 12 and 13 are the worst performers. 5 Discussion The use of the design balanced for carry-over effects in the visual assessment experiments allows for effective estimation of carry-over and order effects and this is discussed further in Nonyane (2004). Bayesian predictive calibration was successfully generalised to calibration for a vector of responses with carry-over and order effects, as well as autoregressive errors, and implementation of this was made possible by the availability of Bayesian MCMC methods. Implementing this method in the Bayesian framework has the advantages that prior information about the parameters can be incorporated, and prediction for missing responses can be made. The calibration does indeed improve the estimation of the stimuli. It relies, though, on the assumption that bias stays consistent between the two successive experiments. This problem may be reduced by testing 20

21 assessors repeatedly over time. The measure of Shannon information in the predictive distribution of future responses, though not so easy to implement, may be used for selecting or ranking assessors. Ranking assessors by R 2 appears to be similar, its only drawback being the inability to incorporate one s prior belief about future stimulus. Acknowledgements The work of the first author was supported by the Cecil Renaud Charitable and Educational Trust for PhD studies at the University of Edinburgh. We are grateful to the late Rob Kempton for suggesting this study and to Dirk Husmeier for useful discussions. 21

22 Appendix A: Approximation to the expected gain in information about a future sequence of stimuli We wish to approximate the expected gain in Shannon information about a future sequence of stimuli x which arises from the calibration experiment, given by { } p(x x, y, y ) p(y x, y) p(x x, y, y ) ln dx dy. (10) p(x ) Since we have a designed calibration experiment, p(x x, y) is equal to p(x ), so that p(y x, y) p(x x, y, y ) = p(x, y x, y) = p(y x, y, x ) p(x ). Thus (10) becomes p(x ) p(y x, y, x ) ln {p(y x, y, x )} dy dx p(x, y x, y) ln {p(y x, y)} dx dy or p(x ) p(y x, y, x ) ln {p(y x, y, x )} dy dx p(y x, y) ln {p(y x, y)} dy. (11) For the case of Normal linear regression with a single stimulus x and response y, and suitable prior distributions, p(y x, y, x ) is a Student density (Aitchison and Dunsmore (1975). In other cases, (11) may be evaluated for any assessor by approximating the distributions of y given x, y, x and of y given only x, y using Normal distributions with the appropriate first and second moments. Then the first and second terms in (11) become q 2 {ln(2π) + 1} 1 2 p(x ) ln det{v(y x, y, x )} dx and q 2 {ln(2π) + 1} ln det{v(y x, y)}, 22

23 where V denotes a variance matrix. Thus (11) is approximately [ 1 ln det{v(y x, y)} 2 ] p(x ) ln det{v(y x, y, x )} dx. (12) 23

24 References Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge University Press. Aitchison, J. (1977). A calibration problem in statistical diagnosis: The system transfer problem. Biometrika, 64: Cross, D. V. (1973). Sequential dependencies and regression in psychophysical judgements. Perception and Psychophysics, 14: DeCarlo, L. T. and Cross, D. V. (1990). Sequential effects in magnitude scaling: Models and theory. Journal of Experimental Psychology: General, 119: Ferris, S. J., Kempton, R. A., Deary, I. J., Austin, E. J., and Shotter, M. V. (2001). Carryover bias in visual assessment. Perception, 30: Finney, D. J. and Outhwaite, A. D. (1956). Serially balanced sequences in bioassay. Proceedings of the Royal Society B, 145: Fisher, A. (1990). Reducing Fat in Meat Animals, chapter New approaches to measuring fat in the carcasses of meat animals, pages London: Elsevier. Gilks, W., Richardson, S., and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall. Morris, R. B. and Rule, S. J. (1988). Sequential judgement effects in magnitude estimation. Canadian Journal of Psychology, 42: Newton, A. C. and Hackett, C. A. (1994). Subjective components of mildew assessment on spring barley. European Journal of Plant Pathology, 100:

25 Nonyane, B. A. S. (2004). Design and Analysis for Subjective Assessment of Visual and Taste Stimuli. Doctor of philosophy, School of Mathematics, University of Edinburgh. Parker, S. R., Shaw, M. W., and Royle, D. J. (1995). The reliability of visual estimates of disease severity on cereal leaves. Plant Pathology, 44: Sampford, M. R. (1957). Methods of construction and analysis of serially balanced sequences. Journal of the Royal Statistical Society B, 19: Sawyer, T. F. and Wesenstein, N. J. (1994). Anchoring effects on judgment, estimation and discrimination of numerosity. Perceptual and Motor Skills, 78: Spezzaferri, F. (1985). A note on multivariate calibration experiments. Biometrics, 41: Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. (2003). WinBUGS: Bayesian Inference Using Gibbs Sampling. MRC Biostatistics Unit, Cambridge, 1.4 edition. Tomerlin, J. R. and Howell, T. A. (1988). DISTRAIN: A computer program for training people to estimate disease severity on cereal leaves. Plant Disease, 72:

26 Table 1: Prior 2 distributions for the parameters in Model 1 Parameter Cover task Count task Expected response β 0 N( 1.008, ) β 0 N(4.062, ) Coef. for current level β 1 N(0.888, ) β 1 N(0.846, ) Coef. for previous level β 2 N(0.012, ) β 2 N(0.017, ) Coef. for seq. position β 3 N(0.002, ) β 3 N( 0.001, ) Standard deviation σ 2 χ 2 (20) σ 2 χ 2 (20) 1 Autocorrelation 1(ρ + 1) Be(5, 5) 2 2 Future stimulus x N( 1.09, 0.74) x N(4.09, 1.49) Table 2: Mean square errors for count task Posterior expected response Observed Model 1 Model 1 Model 2 Assessor Response Prior 1 Prior 2 Prior 1 A B C D E

27 Table 3: Mean square errors for cover task Posterior expected response Observed Model 1 Model 1 Model 2 Assessor Response Prior 1 Prior 2 Prior 1 A B C D E Figure 1: Sample images for count: 27, 60, and 135 Figure 2: Sample images for cover: 10, 26, and 50 percent 27

28 Figure 3: Plots of responses versus true values for cover and count tasks, and their corresponding transformations Recorded Cover (%) Logit Recorded Cover True Cover (%) Logit True Cover Recorded Count Log Recorded Count True Count Log True Count 28

29 Figure 4: Plots for assessor C s responses versus stimuli for first and second occasions and posterior expected stimuli; row 1 corresponds to cover task and row 2 corresponds to count task Logit Recorded Cover Logit Recorded Cover Posterior Expected Logit Cover Logit True Cover Logit True Cover Logit True Cover Recorded Count Recorded Count Posterior Expected Log Count Log True Count Log True Count Log True Count 29

30 Figure 5: Information criterion versus R-squared Information Criterion R-squared 30

31 Figure 6: Plots of log responses versus log true count for Experiment

Design and Analysis for Subjective Assessment of Visual and Taste Stimuli. Bareng A. S. Nonyane

Design and Analysis for Subjective Assessment of Visual and Taste Stimuli. Bareng A. S. Nonyane Design and Analysis for Subjective Assessment of Visual and Taste Stimuli Bareng A. S. Nonyane Doctor of Philosophy University of Edinburgh 2004 Acknowledgements I would like to thank my supervisors, Dr

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling On Regression Analysis Using Bivariate Extreme Ranked Set Sampling Atsu S. S. Dorvlo atsu@squ.edu.om Walid Abu-Dayyeh walidan@squ.edu.om Obaid Alsaidy obaidalsaidy@gmail.com Abstract- Many forms of ranked

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Supporting Information

Supporting Information 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Supporting Information Variances and biases of absolute distributions were larger in the 2-line

More information

Classical Psychophysical Methods (cont.)

Classical Psychophysical Methods (cont.) Classical Psychophysical Methods (cont.) 1 Outline Method of Adjustment Method of Limits Method of Constant Stimuli Probit Analysis 2 Method of Constant Stimuli A set of equally spaced levels of the stimulus

More information

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Method Comparison for Interrater Reliability of an Image Processing Technique

More information

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods Introduction Patrick Breheny January 10 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/25 Introductory example: Jane s twins Suppose you have a friend named Jane who is pregnant with twins

More information

Framework for Comparative Research on Relational Information Displays

Framework for Comparative Research on Relational Information Displays Framework for Comparative Research on Relational Information Displays Sung Park and Richard Catrambone 2 School of Psychology & Graphics, Visualization, and Usability Center (GVU) Georgia Institute of

More information

Understanding Uncertainty in School League Tables*

Understanding Uncertainty in School League Tables* FISCAL STUDIES, vol. 32, no. 2, pp. 207 224 (2011) 0143-5671 Understanding Uncertainty in School League Tables* GEORGE LECKIE and HARVEY GOLDSTEIN Centre for Multilevel Modelling, University of Bristol

More information

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill) Advanced Bayesian Models for the Social Sciences Instructors: Week 1&2: Skyler J. Cranmer Department of Political Science University of North Carolina, Chapel Hill skyler@unc.edu Week 3&4: Daniel Stegmueller

More information

A BAYESIAN SOLUTION FOR THE LAW OF CATEGORICAL JUDGMENT WITH CATEGORY BOUNDARY VARIABILITY AND EXAMINATION OF ROBUSTNESS TO MODEL VIOLATIONS

A BAYESIAN SOLUTION FOR THE LAW OF CATEGORICAL JUDGMENT WITH CATEGORY BOUNDARY VARIABILITY AND EXAMINATION OF ROBUSTNESS TO MODEL VIOLATIONS A BAYESIAN SOLUTION FOR THE LAW OF CATEGORICAL JUDGMENT WITH CATEGORY BOUNDARY VARIABILITY AND EXAMINATION OF ROBUSTNESS TO MODEL VIOLATIONS A Thesis Presented to The Academic Faculty by David R. King

More information

Changing expectations about speed alters perceived motion direction

Changing expectations about speed alters perceived motion direction Current Biology, in press Supplemental Information: Changing expectations about speed alters perceived motion direction Grigorios Sotiropoulos, Aaron R. Seitz, and Peggy Seriès Supplemental Data Detailed

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study STATISTICAL METHODS Epidemiology Biostatistics and Public Health - 2016, Volume 13, Number 1 Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models

An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models An Instrumental Variable Consistent Estimation Procedure to Overcome the Problem of Endogenous Variables in Multilevel Models Neil H Spencer University of Hertfordshire Antony Fielding University of Birmingham

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

ExperimentalPhysiology

ExperimentalPhysiology Exp Physiol 97.5 (2012) pp 557 561 557 Editorial ExperimentalPhysiology Categorized or continuous? Strength of an association and linear regression Gordon B. Drummond 1 and Sarah L. Vowler 2 1 Department

More information

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models Brooke Jeneane

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Advanced Bayesian Models for the Social Sciences

Advanced Bayesian Models for the Social Sciences Advanced Bayesian Models for the Social Sciences Jeff Harden Department of Political Science, University of Colorado Boulder jeffrey.harden@colorado.edu Daniel Stegmueller Department of Government, University

More information

A Bayesian Account of Reconstructive Memory

A Bayesian Account of Reconstructive Memory Hemmer, P. & Steyvers, M. (8). A Bayesian Account of Reconstructive Memory. In V. Sloutsky, B. Love, and K. McRae (Eds.) Proceedings of the 3th Annual Conference of the Cognitive Science Society. Mahwah,

More information

Bayesian meta-analysis of Papanicolaou smear accuracy

Bayesian meta-analysis of Papanicolaou smear accuracy Gynecologic Oncology 107 (2007) S133 S137 www.elsevier.com/locate/ygyno Bayesian meta-analysis of Papanicolaou smear accuracy Xiuyu Cong a, Dennis D. Cox b, Scott B. Cantor c, a Biometrics and Data Management,

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

RAG Rating Indicator Values

RAG Rating Indicator Values Technical Guide RAG Rating Indicator Values Introduction This document sets out Public Health England s standard approach to the use of RAG ratings for indicator values in relation to comparator or benchmark

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Introduction to Survival Analysis Procedures (Chapter)

Introduction to Survival Analysis Procedures (Chapter) SAS/STAT 9.3 User s Guide Introduction to Survival Analysis Procedures (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation

More information

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'18 85 Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics Bing Liu 1*, Xuan Guo 2, and Jing Zhang 1** 1 Department

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5 PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.

More information

An Ideal Observer Model of Visual Short-Term Memory Predicts Human Capacity Precision Tradeoffs

An Ideal Observer Model of Visual Short-Term Memory Predicts Human Capacity Precision Tradeoffs An Ideal Observer Model of Visual Short-Term Memory Predicts Human Capacity Precision Tradeoffs Chris R. Sims Robert A. Jacobs David C. Knill (csims@cvs.rochester.edu) (robbie@bcs.rochester.edu) (knill@cvs.rochester.edu)

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Bayesian Inference Bayes Laplace

Bayesian Inference Bayes Laplace Bayesian Inference Bayes Laplace Course objective The aim of this course is to introduce the modern approach to Bayesian statistics, emphasizing the computational aspects and the differences between the

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

Multiple Bivariate Gaussian Plotting and Checking

Multiple Bivariate Gaussian Plotting and Checking Multiple Bivariate Gaussian Plotting and Checking Jared L. Deutsch and Clayton V. Deutsch The geostatistical modeling of continuous variables relies heavily on the multivariate Gaussian distribution. It

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Bayesian Mediation Analysis

Bayesian Mediation Analysis Psychological Methods 2009, Vol. 14, No. 4, 301 322 2009 American Psychological Association 1082-989X/09/$12.00 DOI: 10.1037/a0016972 Bayesian Mediation Analysis Ying Yuan The University of Texas M. D.

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK Agatha POPESCU University of Agricultural Sciences and Veterinary Medicine Bucharest, 59 Marasti, District 1, 11464,

More information

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan.

A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance. William Skorupski. University of Kansas. Karla Egan. HLM Cheating 1 A Hierarchical Linear Modeling Approach for Detecting Cheating and Aberrance William Skorupski University of Kansas Karla Egan CTB/McGraw-Hill Paper presented at the May, 2012 Conference

More information

Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?

Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution? Thorlund et al. BMC Medical Research Methodology 2013, 13:2 RESEARCH ARTICLE Open Access Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better

More information

Online Appendix. According to a recent survey, most economists expect the economic downturn in the United

Online Appendix. According to a recent survey, most economists expect the economic downturn in the United Online Appendix Part I: Text of Experimental Manipulations and Other Survey Items a. Macroeconomic Anxiety Prime According to a recent survey, most economists expect the economic downturn in the United

More information

Statistical Tolerance Regions: Theory, Applications and Computation

Statistical Tolerance Regions: Theory, Applications and Computation Statistical Tolerance Regions: Theory, Applications and Computation K. KRISHNAMOORTHY University of Louisiana at Lafayette THOMAS MATHEW University of Maryland Baltimore County Contents List of Tables

More information

Imperfect, Unlimited-Capacity, Parallel Search Yields Large Set-Size Effects. John Palmer and Jennifer McLean. University of Washington.

Imperfect, Unlimited-Capacity, Parallel Search Yields Large Set-Size Effects. John Palmer and Jennifer McLean. University of Washington. Imperfect, Unlimited-Capacity, Parallel Search Yields Large Set-Size Effects John Palmer and Jennifer McLean University of Washington Abstract Many analyses of visual search assume error-free component

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

Effects of Sequential Context on Judgments and Decisions in the Prisoner s Dilemma Game

Effects of Sequential Context on Judgments and Decisions in the Prisoner s Dilemma Game Effects of Sequential Context on Judgments and Decisions in the Prisoner s Dilemma Game Ivaylo Vlaev (ivaylo.vlaev@psy.ox.ac.uk) Department of Experimental Psychology, University of Oxford, Oxford, OX1

More information

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges Research articles Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges N G Becker (Niels.Becker@anu.edu.au) 1, D Wang 1, M Clements 1 1. National

More information

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Marianne (Marnie) Bertolet Department of Statistics Carnegie Mellon University Abstract Linear mixed-effects (LME)

More information

A re-randomisation design for clinical trials

A re-randomisation design for clinical trials Kahan et al. BMC Medical Research Methodology (2015) 15:96 DOI 10.1186/s12874-015-0082-2 RESEARCH ARTICLE Open Access A re-randomisation design for clinical trials Brennan C Kahan 1*, Andrew B Forbes 2,

More information

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis Advanced Studies in Medical Sciences, Vol. 1, 2013, no. 3, 143-156 HIKARI Ltd, www.m-hikari.com Detection of Unknown Confounders by Bayesian Confirmatory Factor Analysis Emil Kupek Department of Public

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning E. J. Livesey (el253@cam.ac.uk) P. J. C. Broadhurst (pjcb3@cam.ac.uk) I. P. L. McLaren (iplm2@cam.ac.uk)

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Subjective randomness and natural scene statistics

Subjective randomness and natural scene statistics Psychonomic Bulletin & Review 2010, 17 (5), 624-629 doi:10.3758/pbr.17.5.624 Brief Reports Subjective randomness and natural scene statistics Anne S. Hsu University College London, London, England Thomas

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Impact of Response Variability on Pareto Front Optimization

Impact of Response Variability on Pareto Front Optimization Impact of Response Variability on Pareto Front Optimization Jessica L. Chapman, 1 Lu Lu 2 and Christine M. Anderson-Cook 3 1 Department of Mathematics, Computer Science, and Statistics, St. Lawrence University,

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

Pitfalls in Linear Regression Analysis

Pitfalls in Linear Regression Analysis Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression

More information

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén Chapter 12 General and Specific Factors in Selection Modeling Bengt Muthén Abstract This chapter shows how analysis of data on selective subgroups can be used to draw inference to the full, unselected

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study

Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study ORIGINAL ARTICLE Comparison of Meta-Analytic Results of Indirect, Direct, and Combined Comparisons of Drugs for Chronic Insomnia in Adults: A Case Study Ben W. Vandermeer, BSc, MSc, Nina Buscemi, PhD,

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - - Electrophysiological Measurements Psychophysical Measurements Three Approaches to Researching Audition physiology

More information

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods (10) Frequentist Properties of Bayesian Methods Calibrated Bayes So far we have discussed Bayesian methods as being separate from the frequentist approach However, in many cases methods with frequentist

More information