SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure

Similar documents
A Multi-Process Dynamic Linear Model for Oestrus Detection in Group Housed Sows by Modeling Eating Behaviour

Bayesian Inference Bayes Laplace

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Advanced Bayesian Models for the Social Sciences

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Georgetown University ECON-616, Fall Macroeconometrics. URL: Office Hours: by appointment

Methods for Computing Missing Item Response in Psychometric Scale Construction

DIETARY ENERGY DENSITY AND GROWING-FINISHING PIG PERFORMANCE AND PROFITABILITY

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning

Econometrics II - Time Series Analysis

Kelvin Chan Feb 10, 2015

MS&E 226: Small Data

A Race Model of Perceptual Forced Choice Reaction Time

A Race Model of Perceptual Forced Choice Reaction Time

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

The prevention and handling of the missing data

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Modeling and Environmental Science: In Conclusion

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Estimation of a Meta-analysis model using Gibbs sampler

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

Ordinal Data Modeling

Type and quantity of data needed for an early estimate of transmissibility when an infectious disease emerges

M AXIMUM INGREDIENT LEVEL OPTIMIZATION WORKBOOK

Sponsors. Editors W. Christopher Scruton Stephen Claas. Layout David Brown

Detection Theory: Sensitivity and Response Bias

Measuring Performance Of Physicians In The Diagnosis Of Endometriosis Using An Expectation-Maximization Algorithm

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification

ECONOMIC CONSEQUENCES OF BIOLOGICAL VARIATION: THE CASE OF MYCOPLASMA IN SWINE HERDS

Macroeconometric Analysis. Chapter 1. Introduction

Adjustments for Rater Effects in

Impact and adjustment of selection bias. in the assessment of measurement equivalence

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Sensory Cue Integration

Numerical Integration of Bivariate Gaussian Distribution

GROW/FINISH VARIATION: COST AND CONTROL STRATEGIES

Adaptive EAP Estimation of Ability

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

Neurons and neural networks II. Hopfield network

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Detection Theory: Sensory and Decision Processes

Evaluation of Commonly Used Lean Prediction Equations for Accuracy and Biases

Exploring Experiential Learning: Simulations and Experiential Exercises, Volume 5, 1978 THE USE OF PROGRAM BAYAUD IN THE TEACHING OF AUDIT SAMPLING

CLINICAL TRIAL SIMULATION & ANALYSIS

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Computerized Mastery Testing

Multiple imputation for multivariate missing-data. problems: a data analyst s perspective. Joseph L. Schafer and Maren K. Olsen

Structural Equation Modeling (SEM)

Discussion of Technical Issues Required to Implement a Frequency Response Standard

Learning from data when all models are wrong

Enumerative and Analytic Studies. Description versus prediction

Genetic Algorithms and their Application to Continuum Generation

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Statistical Audit. Summary. Conceptual and. framework. MICHAELA SAISANA and ANDREA SALTELLI European Commission Joint Research Centre (Ispra, Italy)

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

Data Analysis Using Regression and Multilevel/Hierarchical Models

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

2012 Course : The Statistician Brain: the Bayesian Revolution in Cognitive Science

Optimizing Dietary Net Energy for Maximum Profitability in Growing- Finishing Pigs

Description of components in tailored testing

Supporting Information

10 Intraclass Correlations under the Mixed Factorial Design

Bayes Linear Statistics. Theory and Methods

Introduction to Bayesian Analysis 1

4. Model evaluation & selection

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Small Sample Bayesian Factor Analysis. PhUSE 2014 Paper SP03 Dirk Heerwegh

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

A Bayesian Account of Reconstructive Memory

Fundamental Clinical Trial Design

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Generalization and Theory-Building in Software Engineering Research

Evaluation of the Magnitude of Ractopamine Treatment Biases When Fat- Free Lean Mass is Predicted by Commonly Used Equations

Item Analysis: Classical and Beyond

Proof. Revised. Chapter 12 General and Specific Factors in Selection Modeling Introduction. Bengt Muthén

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

BAYESIAN ESTIMATORS OF THE LOCATION PARAMETER OF THE NORMAL DISTRIBUTION WITH UNKNOWN VARIANCE

Expert judgements in risk and reliability analysis

Missing by Design: Planned Missing-Data Designs in Social Science

A Brief Introduction to Bayesian Statistics

Multi Criteria Analysis Of Alternative Strategies To Control Contagious Animal Diseases

BayesRandomForest: An R

Bayesian and Classical Approaches to Inference and Model Averaging

SUPPLEMENTAL MATERIAL

Transcription:

SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA Henrik Kure Dina, The Royal Veterinary and Agricuural University Bülowsvej 48 DK 1870 Frederiksberg C. kure@dina.kvl.dk Abstract: Data from most slaughter pig operations will normally be highly biased due to selection (censoring) of individual pigs prior to the termination of the individual batch. An animal growth model based on mui variate normal distributions is introduced and used as representation of the state of the herd. Methods for reducing the bias and for updating the Belief in the state of the herd are presented. The methods are based on the EM algorithm for bias reduction and on the Kalman fier for belief updating. Models are tested based on simulated data. One of the main features of the Kalman fier the quick adaption to changes in the system modeled is demonstrated by examples. Keywords: Slaughter Pig Marketing, Estimation, Kalman fier, DLM, EM algorithm. 1 Introduction Knowledge of (or a realistic Belief in) the current and expected future state of the slaughter pig finishing operation is crucial in efficient management in a modern slaughter pig production. At present a large quantity of herd specific data is generated internally (i.e. at the farm) and externally (e.g. at the slaughter house) and the quantity and quality of these herd (and animal) specific data is likely to increase in the future as new technologies evolve (see e.g. Geers, 1994 or Saatkamp, 1996, for a current status on identification and recording systems in pig production). However, tools for managing and processing this weah of farm specific data into (for the marketing management) valuable (i.e. precise, relevant and topical) information are not yet developed and/or implemented in practice. Table 1. Mean and variance on carcass weight (CW) with and without selection on the covariate, live weight (W > 93 kg). Based on 200,000 simulated samples. 1) 2) Selection on live weight, W No selection Age 1) E (CW) V (CW) p 2) E (CW) V (CW) 12 76.63 5.02 0.052 64.56 35.32 13 76.17 2.71 0.164 69.00 39.45 14 76.17 2.46 0.255 73.39 43.21 15 75.66 3.21 0.295 77.66 46.45 16 72.79 12.47 0.234 81.87 49.23 Weeks since entering the grower finishing operation (at a weight of 25 kg). Fraction of selected pigs out of total pigs. First European Conference for Information Technology in Agricuure, Copenhagen, 15 18 June, 1997

Figure 1. The animal growth model. Mean growth (at a decreasing rate) and distribution of live weight at 5 different marketing stages. Pigs with a weight above the threshold weight are (except for the last stage at which all remaining pigs are marketed) selected for marketing. Abrupted strait lines have been applied to illustrate the blurred truncations of the left tail of these conditional distributions; i.e. truncation at previous stage is, unless 100% dependence between previous and current stage, blurred at current stage. Based on her/his belief in the herd and in individual pigs the manager will decide when and how (i.e. to which packer) to market pigs. The manager can market all pigs in a batch at the same time or she/he can select individual pigs from the batches in order to reduce the variance on the pigs and consequently increase the number of pigs paid the highest price per kg. As illustrated in table 1 the variance is drastically reduced from 49 kg² without selection and marketing after 16 weeks to approximately 4 kg² when selection based on observed live weight and marketing at all 5 ages is applied. However, the observation will due to this selection be highly biased can not directly be used for estimating the state of the herd. The purpose of this paper is to illustrate one method for eliminating the bias and utilizing the data in dynamically estimating the state of the herd. 2 Models and methods The updating of the belief in the state of the herd is in this presentation partitioned into two steps: A data augmentation and bias reduction step and a belief updating step with the state of the herd represented by the following animal growth model. 2.1 The animal growth model Traits (and growth) of pigs at different ages (as pigs only can be marketed at certain weekdays a discrete model is sufficient) are modeled directly by a mui variate normal distribution as illustrated graphically in fig. 1. The mean growth curve has at each marketing stage been extended with a graph representing the distribution of weight at that particular stage. The shaded areas of the distributions illustrate, as the simulated data in table 1, the effect of selecting pigs for marketing: Reduction of variance.

2.2 Bias reduction The EM algorithm is a method for estimating model parameters from incomplete observations (see e.g. Dempster et al, 1977; Little & Rubin, 1987; Tanner, 1993). The basic idea of the method is to augment the data by generating the missing (not observed) data based on the current guess on the value of the model parameters and the observed data (i.e. the expectations of missing data given the observed data and the current guess of model parameters the E step). Model parameter estimates are then (re)calculated (parameter likelihood maximization the M step) based on this complete data set and the two steps are repeated until convergence. In the method implemented here the E step has been extended with Monte Carlo simulations in order to include conditional variances and avoid numerical instability when calculating Maximum Likelihood estimates. Different levels of additional information. Given different levels of information different assumptions and versions of the EM algorithm are applied: a) High level of information (eg. live weight observed at all stages) and no additional assumptions The EM1 algorithm: The basic EM algorithm applied directly b) Low level of information (eg. only carcass weight observed at slaughter) and selection criterion assumed known The EM2 algorithm: The basic EM algorithm applied directly, but conditioned on the selection criterion. c) Low level of information and shape of the growth function assumed known The EM3 algorithm: The EM algorithm applied with just a single iteration step and constant initial guess. These 3 methods are presented and discussed in detail in Kure (1997). Major resus are presented below. 2.3 Belief updating the Kalman fier A steady Dynamic Linear Model (DLM, see e.g. Harrison and Stevens, 1976; West and Harrison, 1989), also known as Kalman fiering (Kalman, 1960) is used for data fiering/updating the beliefs in the logical groups of the herd the belief in the group effects of the logical groups. The Steady DLM is a simple polynomial DLM of zero degrees: y v, v N(0,V ) (a) 1 w, w N(0,W ) (b) (1) where (a) is the observation equation and (b) the system equation. W is the system variance (or disturbance variance) while V is the observational variance (or observation noise). Logical groups are assumed independent and they are handled/updated independently. Given an initial prior distribution l0 N (m l0, M l0) (where M l0 = C l0 and is a predefined constant) of the expectation of a given group of pigs, l, and a set of prior information D = {y, D }, where y is 1 the observation at time (or updating step) t and D is the initial information, the posterior distribution of l0 the process parameters at step t, ( D )N(m,M ) may be derived recursively using a Bayesian or Kalman fier approach (Kalman, 1960; Harrison and Stevens, 1976) as defined by the updating Table 2. Updating recurrence equations of the Steady DLM. (Harrison & Stevens, 1976) Let = m 1 e = y R = M 1 + W = R + V A = R() 1 Then m = m 1 + Ae M = R AA T

recurrence relationships in table 2. In the model used in this paper y is the unbiased complete sample mean of the n individuals observed at time or updating step t while V is the variance on the observational error on this sample mean. y and V are the resu of the bias reduction step presented above. The Kalman fier is used to dynamically update the belief in the expectations, while other (and more simple) methods are applied in order to estimate/update the belief in the variances (i.e. estimates of C l in the animal model). Model intervention. External information or observation on other traits/variables than those represented in the model can (prior to actually observing changes in the traits modeled) cause changes in the managers belief in the state of the herd. These changes are in the model handled simply by changing the prior distributions. The mean m is changed according to the new expectations (if any), while the covariance M is changed according to the uncertainty of the belief and/or the desired adaption rate to new observation a larger (co)variance implies a quicker adaption to new observation. 3 Resus The models and methods are tested on simulated data using the model developed and described by Jørgensen (1993) and Jørgensen & Kristensen (1995). Simulated data were chosen for the testing in order to get full information of all traits at all marketing stages; In the simulation model it is possible to generate data on all traits (even latent traits) of all pigs at all stages and the true distribution of traits or the true animal growth model can be estimated directly based on data and used as a reference in the calculations. Basic resus of the EM2 compared to the EM1 algorithm are shown in fig. 2. The level of information is low; live weight (W) and carcass weight (CW) are only observed once. The conditional sampling (with perfect initial prior) in the EM2 algorithm reduces the error (i.e. the difference between the true observed mean with perfect information and the estimated mean) on the estimates in the figure from 9.61 to 1.38 kg and the errors on the other 9 variables are even lower (0.46 kg in average). In the situation of a poor initial guess the convergence of the algorithm is worse; the convergence is slower and the stationary values on the other variables than W 5 are more divergent from the true values. To illustrate and test the abilities of the Kalman fier (or the DLM) the EM3 method has been applied to a second data set, in which an intermediate system change have been simulated (i.e. a 10% increase in growth rate after 150 days). The resus are shown in figure 3. The expectation of the initial prior is underestimated by 10%, but a high initial adaption rate (i.e. a high initial variance on (or uncertainty in) W 5, live weight at stage 5 Weight (kg) 104 102 100 98 96 94 92 90 0 20 40 60 80 100 120 140 Iteration # Observed mean, perf. info. W and CW observed once, threshold known W and CW observed once, threshold unknown W and CW observed once, th. known, perf. prior Figure 2. Convergence of the EM algorithm, with and without the assumption that the selection criterion is known and given different initial guesses. Resus of the 1st batch (96 pigs).

the belief, M in (1)) resus in a quick adjustment of the belief in expectation towards the true l0 expectation. It should be noted that a slow and incorrect growth model (in which e.g. E (W ) is 86% 1 of E (W ) after the system change) has been used in the EM3 algorithm. 1 The manager of the slaughter pig operation is assumed to know that the state of the system is going to change (the change might even be induced by the manager), but the expected magnitude of the change is unknown. In the fiering this knowledge is modeled as an instant increase in uncertainty (i.e. in M ) at the time of the expected full effect of the system change, resuing in an instant increase in the adaption rate as illustrated in figure 3. Consequently the fier adjusts very rapidly to the first new observations (i.e. at update # 8) and after a few updates the fier has re stabilized and the variances (confidence intervals in figure 3) on estimates have decreased to the same level as before the system change. 4 Discussion Pigs are normally not individually identified; pigs are simply members of a group (e.g. a batch), but even this group identification might be blurred or even missing, e.g. during marketing. The methods presented here rely heavily on the assumption that the group membership of each individual can be identified and missing group identification may be the biggest problem in applying/implementing the proposed methods in practical slaughter pig marketing management. Another (minor) problem may be imposed by non normal distributions of traits. In general managers may treat different pigs differently (e.g. poor performing pigs are helped to catch up with better performing section mates) in order to increase profitability. As a consequence of this discrimination (and selection) the expected natural normal distribution of traits are disturbed. However, based on an analysis of the empiric data materials the assumption of normally distributed traits cannot in general be rejected and most errors resuing from non normal distributions can be eliminated by choosing appropriate logical groups. All three EM based methods are quite insensible to errors in assumptions/initial guesses and in most cases return acceptable estimates. The actual choice of a method (EM1, EM2, EM3) in a particular situation depends mainly on the informational level, but the knowledge of/confidence in a general or herd specific animal growth model might influence the choice between EM2 and EM3. EM3 is from a computational point of view the most appealing methods as the simulations and iterations of the basic EM algorithm (which can be very processor consuming hours of processing time on a Pentium based 90 W1, CW3 (kg) 85 80 Y_W1 Y_CW3 m^_w1 m^_cw3 95% m_w1 m_cw3 75 70 0 5 10 15 20 25 30 35 40 Update # Figure 3. Model output: Growth change and EM3 with intervention. Unbiased observations (Y_W1, Y_CW3), true (m_w1, m_cw3) and estimated (m^_w1, m^_cw3) means and 95 % confidence intervals on estimates.

Personal Computer) are avoided, but the method does not return estimates of (co)variances. The output of the methods (i.e. parameter estimates) is primarily intended for input in a DSS, but as it represents the current (updated) belief in the herd it may as well serve other management purposes, including a general monitoring of animal performance. The proposed animal growth model was formulated as a strictly empiric model specifically for the purpose of slaughter pig marketing management support, but it may very well prove useful as a stochastic extension of traditional deterministic, mechanistic and continuous growth models. In the simplest case, predictions of expectations of the proposed animal growth model can be assessed by applying an empiric and deterministic growth model in a more traditional fashion (i.e. given some changes in the system, e.g. feeding). In the more general situation the special structure of variances and correlations in the animal growth model can be utilized; Variances and correlations of the model can be modeled by functional relations. Variances are then given as a function of time and correlations as a function of two times (two stages). The Kalman fier proved to be useful in dynamically updating parameter estimates. In the situations of a steady production (i.e. no change in growth rates) more simple methods like a moving average or the EM algorithm applied to all pigs of the last 5 or 10 marketings may return the same or even better resus, but in the case of a system change the true power of the fier was revealed: The ability to quickly adjust to sudden changes and the option of the model builder/user to intervene and supply additional information. The simplest fier (the steady DLM) was chosen for the purpose, but other more sophisticated models, including mui level models and seasonal models (see e.g. Harrison & West, 1989) can be applied in order to improve model performance and longer term predictions (e.g. from month to month or quarter to quarter). 5 References Dempster, A.P., N.M. Laird & D.B. Rubin (1977): Maximum Likelihood from incomplete data via the EM algorithm (with discussion). J. of the Royal Stat. Society, Series B, 39, 1 38. Geers, R. (1994): Electronic monitoring of farm animals: a review of research and development requirements and expected benefits. Computers and Electronics in Agricuure, 10, 1 9. Harrison, P. J. & C.F. Stevens (1976): Bayesian Forecasting (with discussion). Journal of the Royal Statistical Society, Series B, 38, 205 247. Jørgensen, E. (1993): The influence of weighing precision on delivery decisions in slaughter pig production. Acta Agri. Scand., Section A, 43 (3), 181 189. Jørgensen, E. & A.R. Kristensen (1995): An Object Oriented Simulation Model of a Pig Herd with Emphasis on Information Flow. Dina Research Report No. 32. Dina, Copenhagen. 11pp. Kalman, R. E (1960). A new approach to linear fiering and prediction problems. J. Basic Eng., Series D, 82, 35 46. Kure, H. (1997): Marketing Management Support in Slaughter Pig Production. Ph.D. Thesis, Royal Veterinary and Agricuural University, Copenhagen, Denmark. 108pp. Little, R.J.A. & D.B. Rubin (1987): Statistical analysis with missing data. John Wiley & Sons, New York. 278pp. Saatkamp, H.W. (1996): Simulation studies on the potential role of national identification and recording systems in the control of Classical Swine Fever. Ph.D. thesis, Wageningen Agricuural University, the Netherlands. 120pp. Tanner, M.A. (1993): Tools for statistical inference. Methods for the exploration of posterior distributions and likelihood functions. Springer Verlag, New York. 156pp. West, M. & J. Harrison (1989): Bayesian forecasting and Dynamic Models. Springer Verlag, New York, 704pp.