Completely randomized designs, Factors, Factorials, and Blocking

Completely randomized designs, Factors, Factorials, and Blocking STAT:5201 Week 2: Lecture 1 1 / 35

Completely Randomized Design (CRD) Simplest design set-up Treatments are randomly assigned to EUs Easiest to do Easiest to analyze Often sufficient for the goals Example (One-factor study CRD) DayLength (short/long) is the only factor in the study. We have eight hamsters (EUs). We use a random number table to assign the short DayLength to 4 hamsters and the long DayLength to 4 hamsters. NI Enzyme level is recorded at the end of the study. 2 / 35

Completely Randomized Design (CRD) A completely randomized design (CRD) has * N units * g different treatments * n i observations in each treatment where n i = N * Completely random assignment of treatments to units Completely random assignment means that every possible group of units into g groups with the given sample sizes is equally likely. Example (One-factor study CRD) In this single factor study, hamsters have equal probability of being assigned to either of the 2 treatments. N = 8, g = 2, n 1 = n 2 = 4. 3 / 35

Completely Randomized Design (CRD) Example (CRD single factor experiment) The number of times a rod was used to remove entrapped air from a concrete sample was used as the design variable in an experiment. The response variable was compressive strength of the concrete. Three runs were done on each of 4 levels of the factor Rodding Level (10,15,20,25). This was a CRD as the treatments were randomly assigned to the 12 runs. Rodding Level Compressive Strength 10 1530 1530 1440 15 1610 1650 1500 20 1560 1730 1530 25 1500 1490 1510 Montgomery, Applied Stat and Prob Engrs (2011) 4 / 35

Completely Randomized Design (CRD) Example (CRD single factor experiment - SAS) 5 / 35

Completely Randomized Design (CRD) Example (CRD single factor experiment - SAS) 6 / 35

Completely Randomized Design (CRD) Example (CRD single factor experiment - SAS) Diagnostic plots: 1) check constant variance [violated] 2) check normality 7 / 35

Completely Randomized Design (CRD) In the single factor experiment, the usual items of interest... Is there evidence that some means are different? If they are different, which are different from each other? Any pattern in the differences? Estimates/confidence intervals of means and differences. In some special cases, variability may be of interest. 8 / 35

Completely Randomized Design (CRD) A completely randomized design (CRD) can have more than one factor. Example (CRD two-factor experiment) Besides DayLength (short/long), researchers are interested in a Climate (cold/warm) effect. The combination of these two factors give four treatment groups to this study. As a CRD, we will randomly assign the 4 treatment groups to the 8 hamsters (placing 2 hamsters in each treatment group). 9 / 35

Completely Randomized Design (CRD) It can be useful to perceive a CRD two-factor study as a single factor study where the single factor, or superfactor has levels represented by the two-factor treatment groups. Example (CRD two-factor experiment) This two-factor CRD study could also be perceived as a single superfactor experiment... You might think of it this way when you re doing the randomization, or for reasons of convenience that may come-up later. In CRDs, there is no blocking or nesting. Given the treatment group, the observations are independent. 10 / 35

Multifactor Experiments In an experiment, factors can be either... * controlled * controlled for * left uncontrolled * held fixed If we have the ability to choose and set the levels of a factor, then this factor can be controlled. * Choosing dosage levels of a drug (10ml, 20ml, 30ml,...) * Choosing the temperature at which to run a process (250, 300,...) 11 / 35

Multifactor Experiments Sometimes we want to include a factor in our study because it is likely to be a large source of variation, but we don t have the power to assign the levels. The, we instead control for the factor. * Sex * Age * Genetic background, family group * Income level Only factors having relatively small effects on the response should be left uncontrolled. Randomization should take care of these small effects in that our results won t be biased. And obviously, factors having small effects that we are unaware of are left uncontrolled. Holding a factor fixed is an option, but it means you ve narrowed the scope of your experiment. 12 / 35

Multifactor Experiments Factors of interest can be called Primary factors. Other factors may be included in the study as they are known to be a large source of variation in the response, but not of primary interest. These factors can be called Nuisance Factors. For example, we are interested in comparing two drugs or drug brand (primary factor) but we know that age group (nuisance factor) may also be related to the response, but I m not really interested in detecting an age effect or estimating an age effect. 13 / 35

Multifactor Experiments When deciding if a factor is a primary factor or not, I might ask a client: Do you want a formal p-value comparing the different levels of of the factor? In other words, do you want to be able to say... We found Drug A to be significantly different than Drug B... If so, then drug effect is a primary factor. Only nuisance factors, not primary factors, are used as blocking factors. This is because we re not interested in how the response changes from one nuisance block to the next. We re really interested in how the response changes from one treatment to the next within a block. A factor that is used as a blocking factor is usually confounded with other nuisance factors, and that means any observed differences between the blocks could be due to something other than the block itself (e.g. what looked like an age effect was actually a day effect). 14 / 35

Factorial Exeriments Factorials are the simplest kind of multifactor experiment. * design consists of two or more factors * there is no blocking * there is no nesting * CRD set-up, assigning treatments to EUs Example (Two-factor factorial) Revisiting our earlier example, we have 8 EUs, and 4 treatments from the combination of DayLength (short/long) and Climate (cold/warm) that will be randomly assigned to EUs (balanced) as a CRD. 15 / 35

Interaction Plots Interaction Plots or Profile Plots Constructed by plotting the cell means for each combination of factors on the vertical coordinate, with the levels of one of the factors as the horizontal coordinate. When you have two factors, either one can be used as the factor along the x-axis. Often, one of the two possible plots seems better for interpretation purposes. 16 / 35

Interaction Plots Interaction Plots or Profile Plots Interaction plots make it easy to see information in the data quickly... *Which treatment gives the highest response? Lowest response? If the lines are parallel, then the effects of the two factors are said to be Additive and there is no interaction. If the lines are not parallel, then we say the two factors interact or there is interaction between the factors. If we have additive effects, then the effects of a factor are the same for all levels of the other factor. If there is interaction, then the effects of a factor are different at differing levels of the other factor. 17 / 35

Possible Interaction Plots - two factors Interaction Plots or Profile Plots Suppose we have two factors A and B and each has two levels low and high in a factorial CRD experiment. There are a number of possible observed interaction plots. 18 / 35

Possible Interaction Plots - two factors Interaction Plots or Profile Plots Parallel lines. No interaction is present. The effect of factor B is essentially the same for all levels of factor A. We say the effects of these factors are additive effects. 19 / 35

Possible Interaction Plots - two factors Interaction Plots or Profile Plots Non-parallel lines. Interaction is present. The effect of factor B depends on the level of factor A. On the left, the effect of factor B is much larger when factor A is set at high. On the right, the effect of factor B is not only larger when factor A is set at high, but it s in the opposite direction! NOTE: The cell means give us an idea about interaction, but we need to formally test for an interaction in our modeling, and not rely on a graphic suggestion. 20 / 35

Interaction Plots - three factors Higher orders of interaction The previous discussion has focused on 2-way interaction. But you can have a 3-way, 4-way, 5-way,... interaction as well. These interactions quickly become difficult to interpret, and difficult to deal with (without partitioning the data into subsets). Often, we hope (or maybe just assume) that these higher order interactions are not present. But if we are able to test for them, we should. 21 / 35

/Ja_ /J:, ) ij/.eur I ( fn /ud'- Interaction fhj Plots - three factors Higher vw-f orders of interaction Three-way interaction exists among factors if the 2-way interaction in one plot, for a given level of the 3rd factor, differs from the 2-way interaction plot when the 3rd factor is at a different level. Example (Three factor factorial) Consider a three-factor factorial CRD with factors A, B, and C each with a low or high level. There are 8 treatments. <21._Q L H L H A A ==t No 2-v..Jr:.c.y 3-wCly 22 / 35

Interaction Plots - three factors Higher orders of interaction Example (Three factor factorial) pss.s;b;(;f.j... Another possibility... C -k L 1-1 L I-( A 4 f)&l-.s. 2 -Wf/,_'/ 'J:.;J:,.,._.J.;Ok.. di.rd.s <T"1 '7 <!. 3-way 23 / 35

Blocking As we move away from completely randomized designs to more complex designs, the first design we will consider is the randomized complete block design (RCBD). A block of units is a set of units that are homogeneous in some sense. To form blocks, we organize EUs into groups having similar characteristics. EUs Possible Blocking factor patients age (11-20, 21-30,..., 81-90) patients family patients hospital giving the care factory workers shift (early, late, night) field plots location field plots soil composition Remember, don t use a primary factor of interest for blocking. A blocking factor should be a nuisance factor. 24 / 35

Blocking Other ways to form a block (ideally, we want to be able to observe all treatments within each block): * Physically divide an object into parts, like a manufacturing setting. * Repeat testing of the same object under the different conditions (like when you give both drugs to the same person). A RCBD utilizes restricted randomization, where we randomly assign the treatments to the EUs within a block. Thus, if there are r blocks, then we will do r restricted randomizations, one for each block, to assign the treatments to the EUs. In a complete block design, all treatments are observed on each block. If you can not observe all treatments on each block, then you have an incomplete block design. 25 / 35

Blocking Blocked designs are not CRDs. Blocking is a variance reduction technique. Block-to-block variability is still in the data, but we essentially remove this variability when comparing treatments (because we see all treatments within a block). Blocking is most useful when there is wide variability across blocks. We don t usually test for a block effect because we EXPECT a large difference across blocks, that s exactly why we re using it, and it s a nuisance factor anyway. In general, we assume there is no interaction between the block and the treatment. We assume the treatment effect (i.e. differences between the treatments) is the same for all blocks. 26 / 35

Blocking Example (Randomized Complete Block Design) Drugs A, B, C, and D are to be compared. We have formed blocks based on age, and we have 4 patients in each of 6 age blocks. We are not interested in testing for an age effect (it is a nuisance factor) as we are most interested in comparing treatments. Within each block, we randomly assign a unique drug to the 4 patients. 27 / 35

Blocking By comparing treatments within a block, we remove the block-to-block variability from our treatment comparison analysis. Blocking is a powerful tool and should be used if possible to control for any nuisance variation that is thought to be large. 28 / 35

Randomized Complete Block Design (RCBD) RCBD... Uses restricted randomization, performed within each block. * g treatments * g EUs per block * r blocks * rg = N total units It s like r single-replication CRDs glues together. The RCBD is used to increase power and precision of an experiment by decreasing the error variance used in testing. 29 / 35

RCBD Example (Randomized Complete Block Design) Here, we revisit the golden hamster example and perceive the 4 treatments as a single superfactor created from the combination of DayLength and Climate, and we include a nuisance factor Litter (similar to family ) to be used as a blocking factor. We expect large litter-to-litter variability due to genetics. From each of L litters, we have 4 hamsters. Treatments: A) cold/short B) cold/long C) warm/short D) warm/long 30 / 35

RCBD Example (CRD vs. RCBD) See handout on CRD and RCBD for litter example. 31 / 35

Fixed effects vs. Random effects Sometimes the levels of a factor are random. Then this is a random factor and it has random effects. For example, when we randomly choose the litters in our hamster experiment, the factor Litter has random levels, usually numbered as 1,2,3,... and they were chosen from a large population of possible litters. If we repeated the hamster experiment, and again randomly chose litters, the litters from the first experiment would be different than the litters from the second experiment (again, random levels). When the levels of a factor are fixed values then it is a fixed factor and has fixed effects. For example, when you have levels of circle and square for the factor Shape, if you repeated the experiment, you would have the same two shapes (they were not randomly chosen). Primary factors usually have fixed effects. 32 / 35

Random effects Example (Random factor Litter ) The litters in the experiment are a random draw from the large population of litters available. Example (Random factor Day ) The days in your experiment are a random draw (in theory) from the large population of days available. The variability among the litters or the days in the examples above are meant to represent the general variability among these units in the given population. We model random effects and fixed effects differently. Again, primary factors (i.e. factors of interest) are usually fixed effects. 33 / 35

Fixed effects vs. Random effects The name of the factor does not tell you whether or not it is has random effects. Consider the factor called Dosage. Example (Dosage as a fixed effect) There are 3 levels of dosage in the experiment set at 10ml, 20ml, and 30 ml. These are the only dosage levels presently of interest. We want to know if the response is significantly different between 10ml and 20ml (and 10 vs 20, 20 vs 30). If we repeated the experiment, we would use these same three dosages again. Example (Dosage as a random effect) Three dosage levels are randomly chosen from all those available. We will label them as d 1, d 2, and d 3. The variability in the response among d 1, d 2, and d 3 is meant to represent the variability among all dosages. If we repeated the experiment, we would not use these same three dosages. 34 / 35

Random effects If we have a random factor with random effects, what do we wish to estimate? For the dosage random effect, we want to make a statement about all dosages using a random sample of dosages. Typically, we want to estimate the general variability among units, such as σ 2 dosage. Blocking factors, in general, have random effects, but we will start the course by considering them as fixed, but change this later as we move into the random effects topic. 35 / 35