A Short DOE Glossary. blocking. comparative experiment. confounding or aliasing. design constraint

Size: px

Start display at page:

Download "A Short DOE Glossary. blocking. comparative experiment. confounding or aliasing. design constraint"

Elfreda Lang
5 years ago
Views:

1 A Short DOE Glossary blocking An experimental technique that allows the possible effects of known but uncontrolled variables to be completely eliminated from an experiment. Here is a simple example. Suppose you wanted to do an experiment comparing basketball-shooting accuracy with the dominant hand (e.g. right hand for righthanders) vs. the nondominant hand. One way to do the experiment would be to randomly select 10 people and the hand they shoot with, and then compare the overall right hand results with the overall left hand results. In this way of doing things, it is quite possible that any difference between the hands would get washed out by the large overall difference in individual basketball shooting ability. A better way to do the experiment would be to randomly select 5 people and have each shoot both with their right and left hands (perhaps varying which hand they use in random order). One can then look at the difference for each person and average these differences as an overall measure of right vs/ left ability. Because the variation between people cancel out when things are done this way, this variation is completely eliminated from the experiment -- it as if it didn't even exist. Done this way, the experiment has been blocked on people. Of course, this is the simplest sort of blocking that can occur, but it does illustrate the idea. comparative experiment Any experiment whose purpose is to determine the quantitative effect of input(s) that are deliberately changed (the experimental "variables" or "factors")on measured output(s) (the "response(s)"). confounding or aliasing Main and/or interaction effects are said to be confounded or aliased if only their combined effect, not their separate individual effects, can be determined from the experimental design. As a very simple example, suppose that one ran two experimental trials to determine the effect of teaching method and instructor on student performance. A group of 40 students are randomly split into two groups of 20. Half are assigned to Teacher A using method 1 to teach, say, factoring in a math class. The other half are assigned to Teacher B using method 2. After the factoring unit is covered, the performance of the two groups is assessed by comparing student scores on a common exam. Clearly, any systematic difference between the groups can only be ascribed to a combined effect of different teacher and different method, as both teacher and method change together. In DOE terminology, the effect of teacher and teaching method are fully confounded or aliased with one another. Note that no amount of data analysis can determine the separate effects. The aliasing is inherent to the design. Although it may seem that one would always want to avoid such aliasing, it turns out that this is not the case -- and is essentially unavoidable anyway. Indeed, proper control of aliasing turns out to be one of the keys in the sequential design strategy. Note, also, that one may also partially confound effects in a design. This is an advanced (but quite useful) technique. design constraint A mathematical or physical limitation that restricts the possible combinations of the factors that can be tried. For example, in a mixtures experiment (an experiment in which the factors are proportions of the mixture ingredients), any given combination of proportions must always add up to 100% (of course!). In a 1

2 chemical experiment in which the factors are concentrations of various chemicals, certain concentrations may be explosive and so must be avoided. continuous or measurement-type factor An experimental factor that can, in principle, be set anywhere within its experimental range for an experimental trial. Examples are temperature, time, ph, amount of fertilizer added to the soil, height, weight, and so forth. D-optimal design criterion D-optimality is a mathematical technique that is sometimes useful in producing experimental designs, especially in nonstandard and irregular (e.g., not hpercubes or hyperspheres) design spaces. Special purpose computer software is required to use this method as the computations are far too extensive to be done by hand. For those who might care, finding D-optimal designs is an NP-complete problem, so that such designs are only approximated by the software. design resolution The degree of confounding present in a design. Design resolution refers to the amount of detail -- separate identification of factor effects and interactions -- that the design supports. This is only relevant for multifactor, not OFAT, experiments. discrete or categorical factor An experimental factor that can be set only at distinct, separate levels. For example, male and female (in an experiment on fish behavior); metal, glass, or plastic stirrer in a chemical experiment; type of soil -- sandy, clay, loam, gravel (in an experiment on plant growth). Note that categorical factors can be either unordered (male/female) or ordered (number of times the rat previously traversed the maze). efficiency of an experimental design The amount of information generated by an experimental design. Equivalently, the precision in the fitted coefficients of the response surface. Although a complete explanation of this is rather technical, what it comes down to is a way of defining the amount of averaging that the design can achieve. A more efficient design is equivalent to saying that it generates more information which is equivalent to saying that the response surface is known with greater precision which is equivalent to saying that there is less uncertainty in the conclusions.the important idea is that for a fixed amount of experimental effort, usually the more efficient the design, the better. experimental precision The amount of experimental variability that exists, usually determined from the variability of replicated trials. The greater the precision, the less variability there is and the less uncertainty there is in the results, including the fitted response surface. 2

3 Experimental bias The tendency of an experiment to produce results that systematically differ from the true results. For example, a result may be "biased high" if an instrument is improperly operated. A biased measurement is a measurement that is either higher or lower on average than it should be. Experimental variability, error, or noise These words are used synonymously to refer to the fact that when experimental trials are repeated without changing the settings of the factors, the response varies rather than remaining constant. This is due, of course, to the hopefully small effects of changes in many uncontrolled factors that exist in any experiment or measurement. It is never possible to exactly repeat anything. In order to quantify such variability -- which is necessary in order to properly assess how the response depends on the experimental factors -- statistical methods must be used. (experimental) factor or variable Variables that are deliberately manipulated in an experiment in order to assess their effect on the response. For example, in an experiment to assess the effect of various lengths, diameters, and materials on voltage drop across a length of wire, the experimental factors are length, diameter, and material of which the wire is made. The response is the measured voltage drop. factor level The setting of an experimental factor. Typically, in DOE, continuous factors are "standardized" to the range from -1 to +1. For example, if temperature is an experimental factor that is to be varied between 30 and 60 C., then convert 30 to -1, 60 to +1 and linearly interpolate any value between (e.g., 50 interpolates to +1/3). This is equivalent to making a simple linear scale change (like Fahrenheit to centigrade, pounds to kilograms, and so forth). Note, however, that with categorical factors, the ±1 standardization can only be done when there are exactly two categories. When there are more, it makes no sense because it would convert a non-ordered identifying label (which variety of 3 seed varieties) to an ordered scale (-1,0,+1). For this reason, advanced methods must be used to design experiments with categorical factors having more than two categories. Fisher, Sir Ronald A. ( ) The famous British geneticist and statistician who originated and developed the foundations of experimental design. His books, Statistical Methods for Research Workers and The Design of Experiments are classic. Much standard statistical terminology -- like anova and randomization test -- derives from his work. The "F" of the statistical F distribution (upon which the F test is based) is named after him. foldover A sequential design technique that produces a "mirror image" of a given design in order to separate confounded interactions. Generally, this converts designs of resolution III to designs of resolution IV. 3

4 fractional factorial design A fractional factorial design is a design in which only a selected fraction of all the possible combinations of the design factors are run. For the two level hypercube designs, this means only a subset of all the hypercube corners are actually run. hidden replication An experimental design is said to permit hidden replication when, if some of the factors can be safely ignored, there is replication in the remaining factors. For example, suppose one did a 2-factor experiment in which runs at the the four different combinations (-,-), (-,+), (+,-), and (+,+) were conducted. If the second of the two factors had essentially no effect, then, so far as Mother Nature was concerned, a one factor experiment in which the + and - settings was replicated twice was actually done. This replication was "hidden" until it became clear that the second factor could be safely ignored. Hidden replication is most commonly used in "screening" experiments with many (more than 4, say) factors. hypercube The equivalent of a cube in an arbitrary number of dimensions. In DOE, hypercubes are usually stanardized so that all coordinate entries are ±1. Hence a 2-d hypercube is just the ordinary square with 4 corners at (-1,-1), (-1,+1), (+1,-1), and (+1,+1). To save writing, the 1's are usually omitted. Hence, we would give the corners simply as (-,-), (-,+), (+,-), and (+,+). Using this convention, a 3-d hypercube is just an ordinary cube with 8 corners at (-,-,-), (+,-,-), (-,+,-), (+,+,-), (-,-,+), (+,-,+), (-,+,+), and (+,+,+). And so on with 4,5, and more dimensions. Note that in 2 dimensions, there are 2^2=4 corners; in 3, there are 2^3=8; in 4, there are 2^4=16; and, in general, in n dimensions, a hypercube has 2^n corners. interaction A 2-factor interaction(2 fi) is the difference in the response that occurs when both factors are changed simultaneously from what was expected to occur based on the effect of changing the factors individually. When the combined effect is significantly greater than the sum of the individual effects, it is often called symbiosis; when it is significantly less, it is often called interference. Algebraically, a 2-factor interaction is represented by the presence of a cross product term (factor_1 * factor_2) in the model. Graphically, 2 fi's are indicated by significant non-parallelism of the two lines in an interaction plot. An example of such a plot is provided in the herbicide example. Higher order interactions -- that is 3 or more factor interactions -- also rarely may be important. However, these require more complicated designs with more experimental trials than are typically used. So in the basic approach followed in the DOE project, they are not considered. multifactor design An experiment with several experimental factors in which more than one factor at a time changed. 4

5 parsimony, Occam's razor, or the Pareto Principle All of these terms are used equivalently here and refer to the "vital few; trivial many" principle. That is, in any real experiment in which many factors are considered, almost always, only a very small proportion of them will have most of the effect. The rest should be treated as the "trivial many" and considered to be indistinguishable from experimental noise. So in trying to build a model(=fit a simple algebraic equation in basic DOE) to describe the experimental results, one should try to find one that uses as few factors (= parameters =coefficients) as possible. That is, one should be as parsimonious in using coefficients as possible. Occam's Razor refers to the idea that if several models (theoretical or experimental) do equally well in explaining what is observed, than the simplest one (fewest parameters) should be chosen. randomization Running the experimental trials in a random order. This is done to protect against the systematic effects of unknown non-experimental variables (like environment) that might bias the experimental results. There are also other ways in which random assignment is used. For example, in doing clinical trials to determine efficacy and safety of new drugs, patients are almost always assigned randomly to the treatment (receive the drug) vs. the control (receive an inactive placebo) group. This prevents unconscious biases (for example, assigning sicker people to receive the drug) from influencing the experimental results. replication Repeating an experimental trial at constant factor settings in order to determine the amount of experimental variability. Since the settings of the experimental factors do not change, observed variability in the response must be due to the effects of other, uncontrolled factors that are present throughout the experiment. This includes measurement factors. It is important when doing replicates NOT to do them close together in time under nearly identical circumstances. Rather, the replicates should be done over the same range on conditions in which the entire experiment is performed. This allows all the experimental variability that is actually present to be observed and quantified. residual analysis Residual are what's "left over" from the data after a model has been fit. That is, the residuals are defined as: residual = actual data value - value predicted by fitted model If the model fits well, then all systematic behavior is predicted by the model and the residuals should look like random noise. When residuals depart from this behavior and exhibit systematic trends or dependencies, the model may need to be modified. This, in turn, may require appropriate design changes at the next stage of the experimental process. response surface 5

6 The higher dimensional "surface" of true responses (that is, absent all extraneous experimental variatibility) obtained from all possible combinations of settings of the experimental factors (over their allowable experimental ranges). Knowledge of this surface is equivalent to a complete understanding of how the response depends on the experimental factors. If some or all the factors are categorical, the "surface" may actually be isolated points, curves, or other lower dimensional structures. response variable A measured experimental outcome. Response variables may come in many forms. For example, the response in a physics experiment exploring the effects of different numbers of windings and currents on the performance of an electromagnet could be the magnetic force generated. In an industrial experiment on a chemical process, the response might be the yield of the product. In an experiment to develop a new cake mix, the response variables might be taste and texture as rated by a panel of raters on a 1 to 10 scale. In an experiment to see what effect height, sex, and distance from the basket have on foul shooting accuracy, the response could be the number of baskets made out of ten tries. A single experiment might have several response variables that characterize different aspects of the outcome. The key idea is that there must be some kind of "reliable" measurement that is made that can be used for analysis of the results. What is meant by "reliable" is, itself, a complex statistical issue. response surface methods A broad category of experimental design and analysis methods based on fitting models which are linear and quadratic equations in the experimental factors (this includes cross-terms for interactions). Such purely empirical models are useful for describing systems behavior, process improvement, and often increasing understanding so that more detailed conceptual (mechanistic) models can be developed. screening design A screening design is one in which relatively few experimental runs are used to efficiently study a large number of experimental factors to "screen out" those few that are most active from the remainder that are relatively inactive over the ranges being considered. Such designs are very useful in the early stages of sequential experimentation in order to conserve resources and identify the most influential experimental factors for more detailed study. Other essentially synonomous terms for this are "Resolution III," "Plackett-Burman," and "Saturated" design. sequential experimental strategy Sequential experimentation means investigations that are carried out in stages so that each successive experiment can be designed and executed in the light of information gained from previous ones. This is really a description of a scientific learning strategy that encourages the efficient expenditure of limited experimental resources. Although most experimenters intuitively try to do things this way, there are specific design and analytical tools in DOE that have been rigorously developed for this purpose. Some of these procedures are: Residual analysis, fractional factorial designs, design resolution, sequential assembly, foldover, 6

7 response surface methods, D-optimality design criteria, and steepest ascent/gradient optimization It is important to emphasize that this provides experimenters a systematic framework -- not merely an artful philosophy -- in which to execute the strategy. This gives greater control and improved likelihood of success. sequential assembly of designs Building and performing complex experimental designs one stage at a time. Later stages are added only when and if needed. This conserves experimental resources while yielding the maximal information at each stage of the assembly process. Split-plotting A method for running experiments in non-random fashion when not all experimental factors cannot be completely randomized. This is an advanced topic that requires the use of nested ANOVA. statistical model A statistical model is an algebraic equation that expresses how a response of interest is related to the experimental factors and the experimental variability. For example, (I): Resp = K + A*Factor_1 + B*Factor_2 + C*Factor_1*Factor_2 + random_variability is such a model. "K", "A", "B", and "B" are unknown "parameters" or "coefficients" that must be estimated from the experimental data. Factor_1 and Factor_2 are the (known)settings of the the two experimental factors at which the response is actually measured in the experiment. This model is said to be linear because the response is a linear function of the unknown coefficients. This can be a bit confusing, because roles of unknown coefficient and known variable setting reverse what we are accustomed to in such equations. For example, the model (II): Resp = K + A*[Factor_1]^2 + B*Factor_1 + C*Factor_2 + random_variability is also linear for the same reason, even though Factor_1 now also appears as a squared term. In fact, in basic DOE only models of type (I) are usually considered. These suffice for many applications. steepest ascent/gradient methods Methods for improvement based on experiments and analysis that model the response surface as a "mountain" (in n dimensions). The fastest way to climb such a mountain -- that is, the path of steepest ascent -- is to go straight up the sides. By mathematically determining this direction, one can determine how to change the experimental factors to effect the greatest possible change in the response. 7

8 The Scientific Method Some of the fundamental ways in which science is different than philosophy or art or literature surely must include: 1. Science is about predicting observable phenomena. Merely giving explanations after the fact is not good enough. You must predict what will be observed before it is observed. The concept of observable phenomena is also central. This means that given instructions on how to construct measurement equipment, anyone who produces the equipment should be able to measure the "same" results (within experimental variability). Science is democratic and replicable. That is, observation should not depend on who we are, what beliefs we hold, or what salary we make. Of course, the predictions may be probabilistic and involve a level of uncertainty: we only know that the likelihood of thunderstorms is higher under some conditions than others; or that a major earthquake will almost certainly occur along the San Andreas fault within 100 years. Although such predictions involve uncertainty, they are just as legitimate science as, say, the prediction of a space shuttle's orbit. 2. Predictions are made by the development of scientific models. Science is not about discovering eternal truths; rather, it is about developing models from which precise and accurate predictions can be made. On the most fundamental level, science does not discuss truth or the underlying nature of reality at all -- this is the realm of philosophy. All scientific models, can be flawed or incomplete in some respects but still be useful to make predictions within certain defined realms. Another way of saying this, is that all scientific models are falsifiable, but none can be proven (unlike mathematics). There may always be another consequence that observation will contradict. 3. Models are usually, but not always, quantitative and expressed mathematically. One can broadly distinguish two overlapping kinds of scientific models: mechanistic or conceptual models, in which some kind of theoretical construct is used to develop the model; and empirical models which are based exclusively on observed data (and use statistical analysis to develop predictions of what will be observed in the future under other conditions). Overlap occurs, because extended observation usually motivates the development of conceptual models, and conceptual models must always be criticized (i.e. put to the test) by real data. 4. Because all science involves observable phenomena, the inevitable presence of some uncontrolled variability in all observations means that no scientific observation is exactly known -- "we see through a glass darkly." All scientific observation therefore involves uncertainty, and the uncertainty must explicitly and quantitatively be dealt with as part of the process of scientific learning. Contrast this with philosophy or religion or literature, for example. 8

9.0 L '- ---'- ---'- --' X

9.0 L '- ---'- ---'- --' X 352 C hap te r Ten 11.0 10.5 Y 10.0 9.5 9.0 L...- ----'- ---'- ---'- --' 0.0 0.5 1.0 X 1.5 2.0 FIGURE 10.23 Interpreting r = 0 for curvilinear data. Establishing causation requires solid scientific understanding.