COMPUTER-BASED BIOMETRICS MANUAL

Size: px

Start display at page:

Download "COMPUTER-BASED BIOMETRICS MANUAL"

Barrie Ford
6 years ago
Views:

1 Student Name: Student No: COMPUTER-BASED BIOMETRICS MANUAL (Using GenStat for Windows) For BIOMETRY 222 EXPERIMENTAL DESIGN & MULTIPLE REGRESSION 2006 School of Statistics and Actuarial Science University of KwaZulu-Natal Pietermaritzburg Campus Private Bag X01 Scottsville 3209, South Africa Compiled by: Peter M. Njuho, PhD Senior Lecturer

2 COMPUTER-BASED BIOMETRICS MANUAL (Using GenStat for Windows) Goal: To develop understanding of statistical analysis and ability to interpret the results obtained using GenStat. Objectives: To develop clear understanding of statistical concepts used in the design of experiments. To learn how to fit multiple regressions and interpret the parameter estimates. To develop ability in the use of GenStat. To understand how results obtained using calculators related to those obtained using the GenStat. To develop skills and ability to interpret statistical results. Introduction The use of this computer laboratory manual assumes knowledge in Biometry 210, Introduction to Biometry and some basics in Window based GenStat. The manual has been developed to supplement the course material given in Biometry 222, Experimental Design & Multiple Regression. The sections are divided into tutorials where background information is given for each tutorial. Some exercises that test the understanding of the concepts are given together with computer oriented exercises. The ability to interpret some of the results from the analyses is tested using part of the structured questions. An attempt is made to guide the student in getting the crucial GenStat directives. The student is required to ask for help where these directives fail to work or are not clear. It is advisable for each student to work independently and later compare results with a colleague. The student is expected to note down answers in the blank space provided. The data sets referred to in all the exercises are stored in the agriculture computer laboratory server, Pietermaritzburg Campus. The directory can be accessed as F:\Users\Biometry\Biom222\

3 TUTORIAL 0NE Topic: Concept on Experimental Unit & Experimental Design Background Experimental design is a planned arrangement of treatments into experimental units in such away that bias is minimized. The factor of interest under investigation is called the treatment whereas an experimental unit is the smallest unit to which the treatment is applied. Independent application of a treatment to more than one on experimental unit is referred to as replication. Replication is necessary for the purpose of estimating experimental error. Remark: The exercises in this tutorial do not require use of computer. They are meant to assess your understanding of basic concepts of experimental design. Exercise 1.1 A researcher conducted an experiment to compare two room temperatures for doing a particular type of work. There were 6 rooms available for experimentation. Three randomly selected rooms were set at 60 degrees and the other three were set at 72 degrees. Five workers were put in each room and various measurements were made on each relative to their work performance. a)what are the treatments in this experiment? b) What are the experimental units? c) How many replications are there for each treatment? d) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total Exercise 1.2 After the experiment was designed as in exercise 1.1, the researcher decided to add another factor called task. There were 5 tasks in all. Within each room one task was assigned randomly to each worker so that all tasks were done in each room

4 a) What are the treatments? b) Explain why this is not a completely randomized design. c) How would the experiment have to be changed so that it would be a completely randomized? Is such an experiment practical? d) How would you design the experiment if there were only one room for experimentation? - 4 -

5 Exercise 1.3 A researcher was interested in the ability of two chemicals to retard the spoilage of grain. A bin of grain was treated with chemical C1 and another identical bin was treated with chemical C2. The researcher took 10 samples from each bin after an appropriate period of time and measured the spoilage in each sample. a) What are the treatments? b) What are the experimental units? c) How many replications are there in this experiment? d) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total - 5 -

6 TUTORIAL TWO Topic: Concept on Completely Randomized Design Background A completely randomized design (CRD) is one in which the experimental units are assumed to be homogeneous. The randomization process ensures that each treatment has an equal chance of falling any of the experimental unit. Randomization scheme can be established either using random numbers table or computer generated random numbers. Refer to the discussion on randomization made in the class. Replication occurs when a treatment is allocated independently to more than one experimental unit. It is possible for treatments to be unequally replicated depending on the level of precision associated to each treatment. More replications imply higher precision. Remark: The exercises in tutorial one do not require use of computer. The questions are meant to assess your understanding of concepts of completely randomized design applied under different scenarios. Utilize the following extract of random numbers whenever randomization scheme is required Exercise 2.1 An experiment was conducted where fifty students were showed a film on nutrition and another fifty were not. Each group was given a test on nutrition. The test was given at the same time to all the students, and it was given after the first group viewed the film. The purpose was to determine whether the film increases knowledge of nutrition. a) What are potential sources of bias in this experiment? - 6 -

7 b) Demonstrate how you would handle the assignment of students to one of the two groups so that, the potential sources of bias in fact will not bias the results of the experiments? Exercise 2.2 Fifteen consumers are to be selected at random from a certain population to evaluate one of the three formulations of a food product, P1, P2, P3. Each consumer will evaluate only one of the formulations, and the researcher who is conducting the study can deal with only one consumer at a time. a) Come up with a completely random design for collecting data. Make sure that your description of what to do clearly specifies who is to evaluate each product and when the evaluation is to take place

8 b) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total Exercise 2.3 Twenty laboratory mice are to be used in a nutrition experiment. The factors involved in the experiment are protein (2 levels P1 and P2), and fat (2 levels F1 and F2). The diets consist of the possible combinations of the protein and fat levels. a) Come up with a plan in which the rats are randomly assigned to the diets

9 b) Provide analysis of variance table outlining only the sources of variability and the degrees of freedom. Source of variation Degrees of freedom Total c) The researcher decides to add a control diet to the experiment consisting of mouse chow. How would the diets be assigned to the mice so that the design is completely random? - 9 -

10 d) The researcher would also like to add an exercise factor. The levels of exercise in the experiment are to be E1 and E2. (Apparently, the level of mice exercise can be controlled by selecting the type of equipment that goes in the mice cages). Come up with a plan in which the rats are randomly assigned the diet and exercises to the mice

11 TUTORIAL THREE Topic: Additional Exercises on Completely Randomized Design Background The case of unequal number of replications is introduced through exercise 3.1. The within treatment variability is pooled to form the overall experimental unit variability which necessary for estimating the experimental error. Once an overall ANOVA is performed, there is need to conduct further analysis to investigate which treatments were significantly different. This is achieved through performance of t-tests or by partitioning the treatments degree of freedom to single degree of freedom. Each single degree of freedom is associated with a contrast. A contrast is a logical question constructed using certain treatments. Oftentimes, orthogonal contrasts are preferred. The term orthogonal refers to non-overlapping of information. The concept of orthogonality is introduced in exercise 3.2 Remark: The exercises will take you through data entry to actual analysis using GenStat statistical directives. You can choose to enter the data into an Excel Spreadsheet and then copy and paste directly through Clipboard to GenStat Spreadsheet. Alternatively, you can open a new GenStat Spreadsheet where you define the number of rows and columns and then enter the data. Exercise 3.1 Consider data from an experiment set to compare the effects of four levels of thinning on the height growth of Eucalyptus trees. Ten plots per thinning treatment were used. Initially there were 12 trees per plot and after 5 years of growth, 30 randomly selected plots were thinned to 4, 6, 9 and 12 trees. No thinning took place in the remaining 10 plots. After 10 more years, the heights were measured. Some plots were missing due to illegal felling of the trees (Unbalanced case). Number of trees per plot (Treatments) Here you are interested in comparing the four treatments in this case, the four levels of thinning (4, 6, 9 and 12). Since the experiment was carried out as a completely randomized design, and your response variable is height, you would want to do a one-way analysis of variance. The null hypothesis in this case is that the mean heights for the four levels are all the same. You therefore want to investigate whether or not you can reject this hypothesis. a) Enter your data into a spreadsheet. The data set requires 2 columns and 30 rows where column 1 is the treatment, call it thin and column two denotes the response variable, height. Covert column 1 into a factor after entering the data

12 How to get started. Once you log on to GenStat, Click on Spread followed by New and then Create. Enter the number of rows as 30 and columns as 2 and then Ok. Move the cursor onto C1 and click the right button of the mouse. Click on the rename. This allows you to type Thin. Do the same for C2 and type the name Height. You can now start entering the data. The format will be like Thin Height Incase you chose to enter the data in an Excel Spreadsheet, the process to copy the data into a GenStat Spreadsheet follows. Highly the data plus the column names and click copy. Move to GenStat Window and Click on the Spread then New and then, From Clipboard. This takes you to a window New Spread from Clipboard. Select the necessary boxes and ensure the one on Column names are in the first row is selected. b) Provide ANOVA outline, giving only the source of variation and degrees of freedom. Source of variation Degrees of freedom Total c) Carryout the analysis as One-way Anova with no blocking. To conduct the analysis, click on the Stats and then Analysis of Variance. Select General and on Design box, select one-way ANOVA (no blocking). Click the Height to move it into Y-Variate box. Treatment structure in this case is the Thin. Click to move it into this box. Remember to have converted the Thin into a factor. Leave the Block box blank. Record the information from the analysis below

13 Source of variation D.F. SS MS V.R. F pr Total d) Record in the table below the mean, number of replications and standard error associated with each treatment. Treatment Treatment Mean Number of replications Standard error e) Outline the conclusions you draw from your ANOVA? Remember to state the hypotheses being tested and the level of significance you prefer to use

14 f) Perform multiple comparison tests using least significance difference (LSD) approach to determine which treatments are significantly different at 5 % significance level. You will need the difference between the treatment means and standard error of the difference to be able to do this. g) Check to see if the assumptions required for ANOVA namely, normality, independent and constant variance, are varied. You will get this information by clicking on Further Output box after executing the ANOVA analysis and ticking the appropriate boxes. Exercise 3.2 A nursery experiment was conducted to study the growth performance of Albizia zygia seedlings under different fertilizer treatments. Four treatments were included in the design, which was completely randomized in 10 replicates. Data on plant height (in cms) were recorded after a fixed period of time. The treatments were: A: one dose of cowdung; B: two doses of cowdung; C: poultry manure; D: control

15 Fertilizer Plant Height (in cms) A B C D a) Enter the data in a spreadsheet and save it as cowdung.gsh. Again, you require 2 columns and 40 rows to enter the data. Name the first column as Fertilizer and the other as PlantHt. Once you have completed entering the data, convert the treatment into a factor and save the file. b) Present an outline of the ANOVA identify only the source of variation and the degrees of freedom. Source of variation D. F. Total c) Analyse data as simple CRD. On the Stats Menu, select Analysis of Variance then General. While in General, Scroll down to find Completely Randomized Design. The Y-variate in this case is the PlantHt and the treatment is Fertilizer. Record your results below: Source of variation D.F. SS MS V.R. F pr Total Remark: Construction of linear contrasts The four treatments have a structure that allows for the construction of orthogonal contrasts. A contrast is a linear function of treatment means whose coefficients sum up to zero. For instance the linear coefficient for comparing treatments 1 & 2 against 3 & 4 has coefficients 1, 1, -1, -1. If you add these coefficients you get zero. Two contrasts are said to be orthogonal if the sum of cross-product of the coefficients of the linear contrasts equal zero. Suppose we have another linear contrast comparing treatments 1 against 2. This linear contrast has coefficients 1, -1, 0, 0. To show that the two linear contrasts are orthogonal, all we need to show is (1)x(1)+(1)x(-1)+(-1)x(0)+(-1)x(0) = 0. In general, with t number of treatments, we can construct t-1 orthogonal contrasts. In this case we have 4 treatments which imply that we can construct 3 orthogonal contrasts. The three logical pre-planned comparisons in this case are: 0. One dose of cowdung versus two does of cowdung. This corresponds to coefficients 1, -1, 0,

16 1. Cowdung manure versus poultry manure. The coefficients are : 0.5, 0.5, -1, Applying versus not applying manure. The coefficients are: 1, 1, 1, -3. You will realise any other contrast will not be independent of the three. The construction of the linear contrasts depends on the type of research questions that could be of interest. More linear contrasts could still be constructed to answer other questions of interest, but one should always bear in mind that such questions are no longer independent. d) Demonstrate how each of these coefficients were obtained and verify the three comparisons are pairwise orthogonal.. Remark: Testing the contrasts To test the three comparisons, click on Contrast while in Completely Randomized Design dialog. Select the Comparisons by ticking on the box. Click on the effect to be analysed which in this case is Fertilizer and indicate 3 as the number of comparisons to be made. Click Ok. This takes you to a matrix in a spreadsheet form with three rows and 4 columns. The four columns correspond to the number of treatments and the rows correspond to the number of questions or contrasts. Enter each of the comparison coefficients as they are. You can name the rows as Contrast1, Contrast2 and contrast3. Alternatively, you can use the actual names associated with the comparisons. For instance A vs B; (A+B)/2 C; and (A+B+C)/3 D to rows 1, 2, and 3, respectively. e) List the following information associated with these contrasts from your output. Contrasts D.F. Sum of Squares Mean Square V.R. F pr Contrast 1 Contrast 2 Contrast

17 f) Use the information in part (e) to answer the following questions using a significance level of 5 %. i) Is there a difference between using one or two doses of cowdung? i.e. A versus B with coefficients 1, -1, 0, 0. ii) Do the cowdung treatments give different means compared to poultry? i.e. (A+B)/2 - C which gives coefficients 0.5, 0.5, -1, 0. iii) Is there any difference between applying and not applying fertilizer? i.e. (A+B+C)/3 - D which gives coefficients 1, 1, 1,

18 g) Provide overall conclusions indicating the treatment you would recommend and why

19 TUTORIAL FOUR Topic: Randomized Complete Block Design Background Randomized complete block design abbreviated RCBD is the most used design in agricultural experiments owing to its ability to control inherent variability that is uni-directional. When the experimental units are not homogeneous in terms of variability and the pattern of variability can be characterized, blocking techniques should be applied. The experimental units are grouped into homogeneous units referred to as blocks in such a way that the variability within the blocks is minimized whereas the variability between blocks is maximized. All the treatments are randomized within each established block using an independent randomization scheme for each block. The orientation of the blocks should be orthogonal or perpendicular to the variability gradient. It should be noted that the blocks need not be continuous and it is possible to have more than one replication within the block. The blocks are assumed to have been drawn from a large population of possible blocks and for that reason they are considered to be random effects. The interest therefore is mostly in quantifying the amount of variability accounted for by the blocks rather than whether a particular block is significantly different from the other. The following should be noted with a RCBD ) Blocks should be laid perpendicular to the gradient. a) Blocks need not be continuous. b) Possible to replicate within a block. c) A block should signify a known variation that need to be controlled by the experiment. d) All the treatments should be randomized within each block, ensuring independent randomization in each block. Even when no obvious natural blocks that exist, it is still sensible to define blocks representing major patterns of variation. For instance, in on-farm experiments one may use farmers knowledge of crops grown in the previous season and fertility patterns within the farming area. Missing data can also occur in RCBD. The good thing with the design is that, the analysis can still be performed in the event of losing a complete block or replication. Remark Exercise 4.1 tests your understanding in the construction of a RCBD through randomization process, whereas exercises 4.2 and 4.3 establish the link between a paired t-test and RCBD. Each pair acts as a block. However, randomization within the block is not possible since the structure is one of before and after treatment application. Nevertheless, the principle of blocking remains as one of removing the known variability from the experimental error. The purpose of blocking is to reduce the experimental error in order to make the overall test more sensitive to small differences between treatment means

20 Exercise 4.1 An animal scientist has 6 treatments (A, B, C, D, E and F), laid in a randomized complete block design (RCBD) using 3 blocks. Consider the following extract from a random number table ) Determine the random layout of the field experiment for the scientist. (Show a random layout of the treatments, explaining each step you make)

21 a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom. Source of variation D. F. Total b) Give a mathematical model and state the assumptions associated with it. c) State the null and alternative hypotheses. Exercise 4.2 The cooling constants of freshly killed mice and those of the same mice reheated to body temperature were determined. Nineteen mice were used in the experiment. This was a paired experiment. The data is stored in F:\Users\Biometry\Biom222\micepair.gsh

22 a) Analyse the data as a paired t-test (i.e. test the hypothesis of no difference between population means. On the Stats Menu, click on Statistical Tests, and then select One or Two Sample Tests. In the Test box, selected paired t-test. This is a two sided test where you are testing the difference to be equal to zero. List down your output in the following space. b) What are your conclusions? Exercise 4.3 The same data used in exercise 4.2 has been re-entered in a RCBD data entry format. The data is stored in F:\Users\Biometry\Biom222\micepair.gsh. On the Stats Menu, click on the Analysis of Variance. Select General and on Design box, select one-way ANOVA (in Randomized Blocks). Double click to move the variable from the available data dialog to Y-Variate box. Double click on the treatment to move it into the Treatment box and do the same for the block. Remember both the treatments and the blocks are factors. ) Analyze the same data as a randomized complete block design. (Note: we have 2 treatments and 19 blocks). List down the following information from your output

23 Source of variation D.F. SS MS V.R. F pr Total a) What are your conclusions? c) Verify that F-calculate equals square of the t-calculate (i.e. F=t 2 ). Exercise 4.4 Seven litters each of five rats were used in a randomized complete block design, with litters taken as blocks. The researcher is interested in studying the effects of five different diets on the gain in weights of rats. The data is stored in F:\Users\Biometry\Biom222\litter.gsh. b) Demonstrate using the extract of random numbers given in exercise 4.1 how you could allocate the five diets into the seven litters considering each litter as a block

24 c) Carry out the analysis of variance to see whether there are any differences between the diets. Click on Stat Analysis of Variance and then select one-way ANOVA with blocking. Using a 5% significance level. Summarize the output in the space below. c) List down the treatment means and the standard error. Treatment No. Treatment mean Standard error d) Perform LSD test to determine which treatment means are different at 5 % significance level

25 e) Comment on the validity of the ANOVA assumptions. (You need to get the appropriate residual plots by Clicking on Further Output Option immediately after executing the ANOVA). Base your comments on the histogram, half normal-plot and the residual plot which you get from Further Output. Cut and paste these plots in the space provided below. f) What is the proportion of total variation is accounted for by the blocks. g) Compute the relative efficiency of RCBD compared to a CRD and explain gain/loss

26 xercise 4.5 An experiment was carried out to compare the effects of various fungicide treatments on the growth and yield of oil seed rape. Four plots for each of the five treatments were laid out in a randomized complete block design. The treatments (labeled A, B, C, D, and E) were: A untreated control B standard fungicide applied at time 1 C new fungicide applied at full rate at time 1 D - new fungicide applied at full rate at time 2 E new fungicide applied at half rate at times 1 and 2. The data is store in F:\Users\Biometry\Biom222\oilseed.gsh. f) Carry out the analysis of variance to see whether there are any differences between the diets. Click on Stat Analysis of Variance and then select one-way ANOVA with blocking. Using a 5% significance level. g) Fit contrasts to assess the overall difference between the control and the new fungicide, the overall difference between the standard and the new fungicide, and the difference between application times 1 and 2 for the new fungicide. First provide the necessary coefficients associated with these contrasts. You will enter these coefficients into a matrix which you obtain after clicking on the Contrasts then select comparison

27 c) What are your overall conclusions?

28 TUTORIAL FIVE Topic: Latin Square Design Background: In certain situations variability associated with experimental units is bi-directional. Latin square designs have the ability to control such inherent variations. The Latin squares designs are square where the number of treatment, the number of rows and the number of columns are equal. The treatments are applied in such away that each treatment appears once in each row and in each column. The rows, columns, and treatments are assumed to be orthogonal, thus additive effects. This implies that the two and three way interactions between these factors do not exist. These interactions constitute the error component. Basic plans for these designs are available in most of statistics textbooks such as Cox and Cochran (1957). The approach to using Latin squares design involves the selection of a design plan according to the number of treatments under consideration. The rows of the selected plan are randomized using the randomization procedure followed by randomization of the columns. The randomization scheme for the rows is independent of the column randomization. These designs are commonly used in animals and factory experiments. The designs are normally denoted as 3x3 Latin squares,..., 8x8 Latin squares, etc. In practice, the designs are applicable only to experiments in which the number of treatments is not less than four and not more than eight. For small experiments involving less than four treatments, the error degrees of freedom are few leading to less sensitive design. Similarly, for large experiments involving more than eight treatments it becomes difficult to maintain homogeneity. Usually, the design works well in experiments where treatments are between five and twelve. In situations where treatments are less than four, multiple Latin squares could be used in order to increase the error degrees of freedom. Remarks: Exercise 5.1 demonstrates your ability to randomize a Latin square design. Exercises 5.2, and 5.3 use the same information, with the former using direct computation and the latter using the computer. Exercise 5.1 assesses your understanding of computational procedure in attaining ANOVA table, whereas exercise 5.2 shows how you attain the same results using computer. Exercise 5.1 A researcher was interested in estimating the effects of five types of feeds (F1, F2, F3, F4, and F5) on milk production. She selected five animals of relatively different weights (W1, W2, W3, W4, and W5). Five feeding periods (I, II, III, IV, and V) were used. Considering the animals to be columns and periods to be the rows, a Latin square design plan was selected. F1 F2 F3 F4 F5 F2 F3 F4 F5 F1 F3 F4 F5 F1 F2 F4 F5 F1 F2 F3 F5 F1 F2 F3 F4 ) Using this plan demonstrate how randomization could be done. Explain each step you take. Use the following extract of random numbers to set up your randomization scheme and be sure to indicate your final plan

29 a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom. Source of variation D. F. Total

30 Exercises 5.2 In an experiment to assess the durability of four different types of carpet, four machines were available to simulate the wear arising from daily use. As it was thought that there might be differences between the conditions in the laboratory on each day that the experiment was run, a Latin square was used. The percentage wears of the carpet were the measurements made. These measurements are given below. The different types of carpet are denoted by the letters A D. The days are the rows, the machines are the columns and the types of carpets are the treatments. D 38 A 18 C 38 B 39 A 19 D 22 B 26 C 35 B 41 C 54 A 11 D 36 C 61 B 36 D 22 A 16 ) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom. Source of variation D. F. Total ) Give a mathematical model and state the assumptions associated with it. a) State the null and alternative hypotheses

31 b) Complete the following table. Treatment No. A B C D Treatment total Treatment mean

32 c) Compute the sum of squares (SS) for the following components, total, treatments, rows, columns, and error. ) Compute the corresponding mean squares (MS) for part (e)

33 ) Present the analysis of variance (ANOVA) table. Source of variation D.F. SS MS V.R. F pr Total a) State the null hypothesis for testing the equality of the treatment means, and test the hypothesis at 5 % level of significance. b) Compute the standard error of the treatment means difference

34 c) Compute a least significance difference (LSD) at 5 % and determine which treatments are significantly different. d) What are your overall conclusions based on results obtained in part (j)?

35 l) Construct individual, 95 % confidence intervals for treatment mean differences and use these intervals to perform test of significance. ) What are your overall conclusions based on results obtained in part (l)?

36 Exercise 5.3 This exercise refers to the data presented in exercise 5.2 where actual names of the four treatments are known. The data have been entered into a spreadsheet and stored in director: F:\Users\Biometry\Biom222\ carpet.gsh. Open the data file. Consider the following information on the carpet types: A Local material. B Imported material. C Local plus imported material (60 %). D - Local plus imported material (40 %). ) Carry out the analysis of variance to determine whether the four treatments are significantly different at 5 % level of significance. There are two ways one can conduct the analysis. One can either use the standard Latin square design or General analysis of variance. How to use standard Latin square: On the Stats menu, click on the Analysis of variance. Select General then scroll down to find Latin square. Remember the rows, columns and treatments are factors. If they are not, you need to go back to the spreadsheet and convert them. Click to move the factors and the variate to the appropriate dialogue boxes. How to use General Analysis of Variance: On the Stats menu, click on the Analysis of variance. Select the General then General Analysis of Variance. Click to move the variate to the Y-variate dialogue box. Click to move the treatment into the treatment structure dialogue box. In the block structure dialogue box, type Row*Column. This will generate the row, column and row by column interaction components. Present the following information from your output in the table below. Source of variation D.F. SS MS V.R. F pr Total a) Construct three meaningful questions (linear contrasts) and test them at 5% significance level. The construction is based on the structure of the treatments. For instance, the research might be interested in testing if imported material is any better than the local material

37 Once you have constructed your three linear contrasts, repeat the analysis but this time remember to click on the Contrasts, indicate they are three, select the effect which is the treatment and then click on the Comparison box. The process takes you to a small spreadsheet where you enter the coefficients corresponding to the three questions. List the following information associated with these contrasts from your output. Contrasts D.F. Sum of Squares Mean Square V.R. F pr Contrast 1 Contrast 2 Contrast 3 b) Comment on the ANOVA assumptions using plots obtained through Further Output option. Click on the residual plots to get the plots. Each of the plots tests one or two of the assumptions made on the residual effect. (These are: normality, independent, and constant variance). c) Suppose these assumptions are violated. Transform the data using logarithm function and repeat the analysis, using the transformed data as the response variable. To transform the data on the Data menu, select Transformation. Scroll down to select the appropriate transformation function, in this case the Log function. Click on the variable to be transformed and provide a new name for the transformed data. Choose the option that displays the transformed in the original spreadsheet containing the data. Provide your output which includes the contrast in the table below

38 Source of variation D.F. SS MS V.R. F pr Total d) What are your final conclusions? Exercise 5.4 An ornamental horticulturist conducted a fertilizer experiment in a greenhouse where 5 fertilizer treatments (A, B, C, D, and E) were tested by arranging plants in a Latin square design. Thus rows and columns in the table are rows and columns in the greenhouse. The data below shows the yield from the experiment. A 22 B 23 C 19 D 12 E 14 B 20 C 13 D 16 E 19 A 18 C 14 D 10 E 12 A 26 B 23 D 19 E 18 A 20 B 18 C 14 E 15 A 24 B 20 C 17 D

39 ) Write down the mathematical model for this design assuming rows and columns to be random effects. a) List down the parameters to be estimated from the model stated in part (a). b) List the necessary assumptions for the model stated in part (a). c) State the null hypotheses to test the: ) Fertilizer effects, (Consider the effects fixed) ii) Row effects, (Consider the effects random)

40 i) Column effects, (Consider the effects random) d) Enter the data into a spreadsheet and save it as hort.gsh. (You will require 4 columns and 25 rows, where the fertilizer, row, column are converted to factors and the yield is a variate) e) Conduct the analysis of variance and present your output in the table below. Source of variation D.F. SS MS V.R. F pr Total f) Using a 1 % level of significance, determine if the mean yields are equal for the 5 fertilizers

41 g) Suppose the answer in part (g) is that they are different. Use a least significance difference (LSD) test at 1 % level of significance to determine which means are different. h) What is the proportion of the total variation that is being accounted for by columns? i) Comment on the validity of some of the assumptions stated in part (c), using plots obtained through Further Output option

42 j) Compute the relative efficiency of the Latin square design (rows as blocks) over RCBD and interpret the value you obtain. k) What are your overall conclusions in terms of the best fertilizer treatment and the effectiveness of the design?

43 TUTORIAL SIX Topic: Split-Plot Design Background Split-plot design involves two- or higher-order treatment structure with an incomplete block design structure and at least two different sizes of experimental units. The simplest split-plot design involves two factors where one factor levels are randomly applied to the blocks and the other factor levels are applied to the whole-plot. It implies that the treatment to be measured with higher precision is applied to the smaller experimental unit and that of less precision applied to the larger unit. Consequently, the interaction is measured with a higher precision. The whole plot treatment can be applied to any form of design structure depending on the nature of the experimental units. For instance, if CRD is to be used, the whole-plot treatment is randomly assigned to the units. The whole plot is them sub-divided into smaller units called sub-plots. The whole-plot is taken as a block with respect to the sub-plot treatments and all randomization procedures for the RCBD apply. That is, independent randomization scheme for each whole-plot. Similarly, if the experimental units are first grouped into blocks such that variability within blocks is minimized and variability between blocks is maximized, then the whole-plot treatment design structure is RCBD. The whole-plot factor levels are randomized within each block using new randomization scheme for each block. Again, the whole-plots act as blocks for the sub-plot factor levels. The selection of a split-plot design depends on practicability of the treatments. Say applying fertilizer to a whole-plot and varieties to a sub-plot, etc. The fact that there are two experimental units imply that there are two experimental errors, hereby, referred to as error (a) and error (b). The plot layout requires the whole-plot treatments to be randomly applied to the whole-plot and then the sub-plot treatments are applied to each whole -plot randomly. It should be noted that the experimental error from a RCBD is split into error (a) and error (b) when one decides to apply split-plot. There must be a reason behind the use of a split plot. Split-plot design should not be used when there is no reason as to why one factor should be assessed with a higher precision than the other. Another reason for using split-plot is when one factor requires a bigger experimental plot due to management practice than the other. For instance irrigation may require a bigger plot or use of a tractor. Remark Exercise 6.1 determines your ability to randomize the factor levels using random number table in splitplot layout with the whole-plot factor established in a RCBD. Exercise 6.1 An experiment was conducted to determine the performance of three oat varieties under several levels of nitrogen with respect to yield. The varieties (V1-Marvellous, V2 Victory and V3 Golden rain) were the whole-plot treatments and the nitrogen levels (N0 0 cwt, N1 0.2 cwt, N2 0.4 cwt, and N3 0.6 cwt) were the sub-plot treatments. The whole-plot treatments were replicated 6 times. The total number of experimental units required are (3 variety levels)x(4 nitrogen levels)x(3 replications) = 36. Consider the following extract from a random number table

44 a) Determine the random layout of the field experiment. (Show a random layout of the treatments, in the plots provided below and explain each step you make). You need to randomize the varieties within the block first and then randomize the nitrogen levels within each whole-plot. The whole-plot acts as a block for the subplot treatment. Use the notations V1 V3, and N0 N4 to show the layout. Block I Block II Block III Outline the steps you have taken in the randomization process in the space provided below

45 a) Give analysis of variance (ANOVA) table showing only the source of variation and degrees of freedom. Source of variation D. F. Error (a) Error (b) Total

46 Exercise 6.2 Refer to the data stored in F:\Users\Biometry\Biom222\golden.gsh which refers to exercise 6.1 and is setup to answer the following questions. ) Carry out the analysis to test the effects of variety, nitrogen and variety by nitrogen interaction at 5 % significance level. Once you get the data, display them in a spreadsheet, and proceed with the analysis as follows: On the Stat menu, select the Analysis of Variance, click on General and then Split Plot Design. Double click to move the variate and factors to the appropriate dialogue boxes inorder to obtain the initial ANOVA. Remember to click on further output to get the residual plots required to check on the assumptions. To get the estimates of the various variance components, click on Options then ensure the stratum variance dialogue box is selected. Present the ANOVA output in the table Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Total b) Attempt to obtain the same results in (a) by use of General Analysis of Variance directives. On the Stats menu, click on the Analysis of variance. Select the General then General Analysis of Variance. Click to move the variate to the Y-variate dialogue box. Click to move the treatment into the treatment structure dialogue box as Variety*Nitrogen. This command will generate Variety+Nitrogen+Variety.Nitrogen. In the block structure dialogue box, type Block/Variety. This line will automatically generate Block+Block.Variety. Note that Block.Variety is our Error (a). We do not need to specify Error (b) since it will be generated automatically

47 c) List down the estimated variance components in the table below: Name of Variance component Estimate of variance component Percentage Block Block*Variety Residual Total ) Where do we have the most variability? Interpret what this implies with respect to the overall design. i) Were the blocks effective in controlling variability? Why or why not?

48 Exercise 6.3 In an experiment to study the effect of two meat-tenderizing chemicals, the two (back) legs were taken from four carcasses of beef and one leg was treated with chemical 1 and the other with chemical 2. Three sections were then cut from each leg and allocated (at random) to three cooking temperatures, all 24 sections ( 4 carcasses 2 legs 3 sections ) being cooked in separate ovens. The table below shows the force required to break a strip of meat taken from each of the cooked sections. Leg Carcass Section Chemical Temp Force Chemical Temp Force Consider chemical as the whole-plot treatment and temp as the sub-plot treatment. The whole-plot treatment was laid in a randomised complete block design. The carcasses are the blocks. a) Provide a schematic sketch of how the treatments are applied

49 b) Is it possible to separate the effect of legs from that of the chemical? Why or why not? c) Give an outline of the ANOVA identifying the source of variations and degrees of freedom. Source of variation Degrees of.freedom Error (a) Error (b) Total d) Enter the data into a spreadsheet and save it as carcass.gsh. Note: You require 4 columns (Carcass, Chemical, Temp, and Force) and 24 rows. e) Carryout a detailed analysis and summarise your outputs in the following table. Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Total

50 f) Obtain the two-way table of the chemical and temperature means and the associated standard errors. g) Illustrate how the standard errors given in part (f) are computed. h) Present your overall conclusions which could be used to write the final report

51 TUTORIAL SEVEN Topic: Split-Split-Plot Design Background A situation occurs when more than two sizes of experimental units are used. Three sizes of experimental units require three stages of randomisation, which corresponds to three sources of experimental errors, say errors (a), (b) and (c). The decision on which treatment goes to which plot, is determined by the required precision, as well as by plot management. Note that, the whole-plot treatment could be applied to any design structure, such as CRD, RCBD, Latin squares, etc., The process of split-split plot design layout can be extended to any other level. The number of experimental units and errors increase in the same number. Again, it should be noted that the standard errors from such analysis are incorrect, especially for unbalanced data or data characterised by missing values. This implies that, test statistics for means comparison and confidence intervals are inappropriate. It is therefore recommended that REML procedure be applied because it provides correct standard errors and hence, appropriates test statistics and confidence intervals. Exercise 7.1 Consider an experiment on grain yields of three rice varieties grown under three management practices and five nitrogen levels. The experiment was carried out in a split-split-plot layout with Nitrogen as Whole plot, Management Practice as Subplot, and Variety as Sub-subplot Factors, with the whole plot treatment applied in a RCBD with three replications. The data are store in F:Users\Biometry\Biom222\spltspltplot.gsh. a) Provide a schematic sketch of how the treatments are applied

52 b) Write down the mathematical model for this design and state the assumptions. c) Attempt to present a sketch of the format of ANOVA table giving only the sources of variation and degrees of freedom. Source of variation Degrees of.freedom Error (a) Error (b) Error (c) Total d) Carried out the analysis using the standard split-split-plot design available from analysis of variance option

53 Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total e) Conduct the test of hypotheses using 5 % significance level and make the necessary conclusions

54 f) Present the various two way tables of treatment means and the associated standard errors. g) Indicate the proportions of total variation accounted by each of the random components

55 Exercise 7.2 Use the general analysis of variance option to obtain results which are similar to those obtained using standard split-split- plot design in exercise 7.1 part (d). Note Blocks/variety/management/nitrogen generates: Block+Block.variety+Block.management+Block.variety.management+Block.Nitrogen+ Block.variety.management.nitrogen Variety*management*nitrogen generates: Variety+management+variety.management+nitrogen+variety.nitrogen+ Management.nitrogen+ variety.management.nitrogen Recall order matters. Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total

56 Exercise 7.3 A study was conducted to determine the influence of plant density and hybrids on maize yield. The experiment was a 2x2x3 factorial replicated 4 times in a randomized complete block design arranged in a split-split-plot layout. In this experiment, factor A is the 2 maize hybrids (P3730 and B70xLH55) assigned to the main plots, factor B is the 2 row spacing (12 and 25 inches) assigned to the subplots, and factor C is the 3 target plant densities (12 000, , and plants per acre) assigned to the subsubplots. The data is presented below: Grain Yield (Bushels per Acre) Replications Hybrid Row Spacing Plant Density ( 000) I II III IV P B70xLH a) Enter the data into a spreadsheet and safe it as maizeyld.gsh. Hint: Label the columns as: Rep Hybrid RowSp PlantD Yield b) Attempt to present a sketch of the format of ANOVA table giving only the sources of variation and degrees of freedom. Source of variation Degrees of.freedom Error (a) Error (b) Error (c) Total

57 c) Carried out the analysis using the standard split-split-plot design available from analysis of variance option. Source of variation D.F. SS MS V.R. F pr Error (a) Error (b) Error (c) Total d) Test the various hypotheses for fixed effects and provide the conclusions

58 e) Present the various two way tables of treatment means and the associated standard errors. f) Indicate the proportions of total variation accounted by each of the random components

59 TUTORIAL EIGHT Topic: Repeated Measures Experiment Background The repeated measures designs are often called split-plot in time. In split-plot experiments, the errors are assumed to be independent whereas in repeated measures experiments, they are thought to be correlated as a result of inability to randomize time periods. Repeated measures designs involve one or more steps where the experimenter cannot randomly assign the levels of one or more treatments to a given size of experimental unit. A size of an experimental unit is sometimes determined by a time interval when a given unit is observed at different points of time (e.g. growing plants, growing animals, etc.,) Consider a situation where a treatment is applied to an animal (assumed to be an experimental unit) and observations are time over time, say after one, two, three, etc., weeks. It is impossible to randomize time, considered to be a subplot. This is different from the known split-plot where the subplot treatment is possible to randomise. The aspects of having to take multiple measurements on the same experimental unit forms the bases for a repeated measures experiment. These experiments are quite common with animal or tree experiments. Most experiments with these kinds of characteristics have been analyzed as if they were separate experiments done over time. This approach is inadequate because it ignores time completely. The experiment is such that subjects which may be different for each treatment are nested with treatments but crossed with time (considering time as a factor). Treatments are applied to different subjects (one of unit) and observation taken on a subject at different time intervals. The fact that the design leads to correlation of responses through time and time cannot be randomised makes the design different from split-plot design. Exercise 8.1 An experiment was carried out at Ukulinga farm to study the effect of different sources of fertilizer on the growth of Swiss Chard. Seven treatment combinations were applied in a RCBD with 4 replications. The harvests were done at 8 time intervals. Treatments 1 Control; 2 Chemical fertilizer at 50%; 3 Chemical fertilizer at 100%; 4 Composit at 50%; 5- Composit at 100%; 6 Biodigester liquid at 50% and 7 Biodigester at 100% a) Refer to the data F:\Users\Biometry\Biom222\Swissch.gsh. Carry out the analysis as a split plot design

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction