Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Research Questions, Variables, and Hypotheses: Part 2 RCS 6740 6/7/04 1 Review What are research questions? What are variables? Definition Function Measurement Scale 2 Hypotheses OK, now that we know how to set up a research project by posing research questions and labeling variables, it is time to move on to a more formal way of structuring and interpreting research. 3

Hypotheses Definitions Hypotheses are predictions about the relationship among two or more variables or groups based on a theory or previous research (Pittenger, 2003) Hypotheses are Assumptions or theories that a researcher makes and tests. Why are hypotheses important? 4 Importance of Hypotheses Hypotheses: Direct our observations Identifies the variables examined and data to be collected Describe a relationship among variables Can state that as one variable increases, the other will decrease; as one variables increases, the other will increase, and so on. Refer to populations Hypotheses help researchers infer that results of a sample will translate to a population 5 4 Functions of Hypotheses Hypotheses can: Estimate Population Characteristics Correlate Variables Display Differences among Two or more populations Show possible Cause and Effect What research designs relate to each of these 4 functions? 6

Symbols used in Hypotheses M= mean µ (mu: mew)= population mean Roman Letters (e.g., A, B, C, D) are used to represent statistics Greek Letters (e.g., α, β) are used to represent parameters α= significance level; probability of committing a Type I Error (α=.05) p= probability value (p=.05) Null Hypothesis= (H 0 : µ 1 - µ 2 = 0 or H 0 : µ 1 = µ 2 ) Alternative Hypothesis= (H 1 : µ 1 -µ 2 0 or H 1 : µ 1 µ 2 ) 7 Types of Hypotheses Research Hypotheses Statistical Hypotheses 8 Research Hypotheses Research Hypothesis: A statement of the relationship among two or more variables or groups. The acceptance or non-acceptance of which is based on resolving a logical alternative with a null hypothesis. Example: Student who are taking RCS 6740 will score higher on Exam 2 than students who do are not taking RCS 6740. 9

Research Hypotheses Cont. Research hypotheses can be stated as Directional or Non-directional. Directional hypotheses predict the specific relationship among two or more variables or groups: Student who are taking RCS 6740 will score higher on Exam 2 than students who do are not taking RCS 6740 H 0 : µ 1 < µ 2 H 1 : µ 1 > µ 2 IQ scores will correlate in a positive manner with Self Esteem Scores Cats will bark less frequently than Dogs 10 Research Hypotheses Cont. Non-Directional Hypotheses predict that there will be differences among two or more groups, but do not specify the direction of the differences Men and Women will differ in their recall of phone numbers The scores on the Geriatric Depression Scale will differ between people with Stroke and people with Alzheimer s disease IQ scores will correlate with Self Esteem scores 11 Research Hypotheses Cont. Your Turn! Come up with some directional and nondirectional research hypotheses Share with the class 12

Statistical Hypotheses Statistical Hypotheses are mathematical, or logical statements that help researchers interpret the results of research Statistical hypotheses consist of the Null Hypothesis (H 0 ), the hypothesis of no difference and the Alternative Hypothesis (H 1 or H A ) which is similar in form to the research hypothesis. Null: (H 0 : µ1 - µ2 = 0 ) Alternative: (H 1 : µ1-µ2 0) Written out examples: Null: There will be no difference on Exam 2 scores between student who are taking RCS 6740 and students who do are not taking RCS 6740 Alternative: There will be a difference on Exam 2 scores between student who are taking RCS 6740 and students who do are not taking RCS 6740 13 Statistical Hypotheses Cont. Remember, and this is important: The null hypothesis always implies that there is no relation or statistical difference between variables or groups The alternative hypothesis implies that there is a meaningful relationship among variables or groups 14 Testing Hypotheses We only test the null hypothesis; we do not test the research hypothesis. Our decision about the null hypothesis is the only link between what our statistical models tell us about probability, and the decision we make concerning the research hypothesis (Williams, 1986, p. 55). 15

Testing Hypotheses Cont. When a variable has more than one operational definition, more than one null hypothesis may be associated with one research hypothesis. We use a variety of statistical procedures to test null hypotheses. The choice of which procedure depends on a variety of factors including the research hypothesis, the data, the sampling strategy, and what we want to be able to say as a result of our testing. 16 Types of Tests Statistical procedures that are commonly used for hypothesis testing include: correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), regression, multivariate analysis of variance (MANOVA), t-tests, and Chi-Square. Each of these procedures has an associated test statistic, which is used to determine significance. For example ANOVA, ANCOVA, and regression use F statistics and their associated p-values. Multivariate procedures, like MANOVA, use a variety of test statistics with interesting names, like Wilk s lambda. These are then related to a more common test statistic, like F. The secret here, for the layperson, is that all test statistics are eventually related to a probability distribution and a p- value. These p-values mean the same thing across test statistics. 17 Error Types In hypothesis testing, we must contend with two types of errors -- Type I and Type II. Errors are mistakes that we can make when judging the null hypothesis Type I error is what happens when the tested hypothesis is falsely rejected. (It is when you say you found something, but that something is really an error.) A type I error is a false positive. Type II error is what happens when a false tested hypothesis is not rejected (Hays, 1986). (It is when you don t find something that is, in fact, there.) A type II error is a false negative. 18

Error Types Cont. Alpha is the level of probability (pre-set by the researcher) that the tested hypothesis will be falsely rejected. Alpha is the pre-set risk of a Type I error. In other words, alpha is the degree of risk that you accept, in advance of conducting the study, that what you find will be an error. Beta is the probability (often neglected by the researcher) that a false null hypothesis will not be rejected. Beta is the probability that you won t find what you are looking for if, in fact, it is really there. 19 Error Types Cont. The picture on the next slide, which is modified from Hays (1986) and Ferguson and Takane (1989), illustrates the relationships among Type I and Type II errors and alpha and beta. 20 Error Types Chart Error Types Cont. H 0 is True H 1 is True Decision Reject H 0 Fail to Reject (decide in favor of H 0 ) Type I α Correct 1- α Correct 1- β Type II β 21

Error Types Cont. Link to Real World Example of Error Types: http://www.intuitor.com/statistics/t1t2errors.html 22 Measurement STATISTICAL POWER Statistical power is the probability of rejecting a null hypothesis that is, in fact, false (Williams, 1986, p. 67). Put more simply, statistical power is the probability of finding relationships or differences that in fact exist (Cohen, 1988). In our fish story, it is the probability of finding fish in Lake Alice, if they are in fact there. In terms of beta (the probability of a Type II error), statistical power = 1 - beta. Statistical power is a function of the preset significance criterion [alpha], the reliability of sample results, and the effect size [the actual size of the difference or strength of the relationship]... (Cohen, 1988, p. 4). Considering complex interrelationships of the above criteria, one can say that The researcher can easily set alpha, but cannot easily set beta. Alpha and beta are directly, but not perfectly related. Lowering alpha increases beta and lowers the power. Increasing alpha decreases beta and increases power. 23 Statistical power is then related to: Sample size Effect size Statistical design (including number of groups, 1- vs. 2-tailed tests) Significance criteria 24

EFFECT SIZE Effect size (ES) refers to the amount of common variance between the independent variable(s) (IV) and the dependent variable(s) (DV), or the degree to which changes in the IV(s) result in changes in the DV(s). 25 Effect Size Cont. For example, if I am interested in the differences in competitive closure rate between rehabilitation counselors with master s degrees in rehabilitation counseling and those with bachelor s or unrelated master s degrees, my effect size would be the size of the difference between the means of the two groups. Or, if I wanted to test a specific intervention for students with learning disabilities, and I had a test, which I believed measured the effectiveness of my intervention; then my effect size might be the difference in test scores between an experimental group that received the intervention and a control group that did not receive the intervention. 26 Effect Size Cont. Similarly, if I wanted to examine the impact of a specific course on research anxiety, effect size could be the differences in the mean scores of research anxiety between an experimental group who completed the course and a control group who did not. 27

Effect Size Cont. Here is a large problem: Effect size depends on what measure we use to operationalize the construct. For example, effect size depends on the net we use, the test we select, etc. Actual effect sizes may be much larger than observed effect sizes. What might be considered a moderate to large effect in a laboratory situation may appear as a small effect in the real world where you can t control numerous sources of extraneous variance, e.g., variability in individual characteristics, treatment implementation, environmental characteristics (Cohen, 1988). Small effect sizes are common and should be expected in ex post facto and quasi experimental situations (Cohen, 1988). 28 RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER This is just a conceptual introduction. We will return to validity of measurement in a future lecture. All research depends on an operational definition of the constructs of interest. In intervention research, the operational definitions of both the treatments and the outcomes influence effect size. As we are all aware, there are a variety of frames of reference regarding interventions and outcomes. 29 RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Consider the chapter 1 of the elephant fable with the researchers who mapped different parts of the elephant. Their descriptions of the elephant differed considerably. What we see in research depends, at least in part, on what facet(s) of the construct of interest is (are) operationalized by our outcome measure(s). It is always better to look at the construct in more than one way (more than one facet) in order to limit threats to validity from mono-operational bias. In other words, looking at the elephant from different angles can improve the degree to which our descriptions of the elephant actually describe the elephant. 30

RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Now, consider measuring the same elephant with portable X-Ray machines. Pictures of each part of the elephant are taken and then compared with each other. Not only do these pictures not resemble each other, but they also don t resemble the descriptions provided by the previous group of researchers. This chapter of the elephant fable indicates how what we see is indicated by our method of observation or measurement. Again, a researcher interested in a deeper understanding of the elephant may choose multiple methods of measurement in order to avoid threats to validity from mono-method bias. 31 RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. The relationship of measurement, research design, and statistical power means that large treatment effects can actually be observed as small effects. In other words, even if an intervention is very effective, measurement and design complications may make the effect appear small and thus require high statistical power for detection. The following telescope model depicts the interrelation. The effect is obscured when we only look at part of the construct of interest. The apparent effect size is then attenuated by the extent to which our operational definitions (including our measurement techniques) do not reliably and validly capture the construct of interest (i.e., intervention effectiveness). 32 Telescope Model Actual Effect Size Observed Effect Size Measurement Research Design Statistical Design 33

RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Apparent effect size is further attenuated when research design does not fully filter out extraneous sources of variation (e.g., counselor or client differences). Violations of assumptions of statistical procedures can further attenuate effect size. Interestingly, problems in research design and statistical design can also introduce sources of Type 1 error (e.g., dust on the lens or false positive results). The relationship of effect size, measurement, and design is further complicated by the frame of reference or angle from which one approaches or operationalizes the construct. This complication is illustrated in the following figure. 34 Telescope Model 2 35 RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. Validity is a key element of the relationship of effect size, measurement, and design. Clearly, qualitative methods can further valid operationalization of constructs. Multiple operational definitions and multiple methods as recommended by Cook and Campbell (1979) can enhance the validity of research, including counseling effectiveness research. Further, units of measurement should be carefully considered in planning research. Researchers considering the social and cultural context of behavior have questioned the reductionist tradition of separating acts, actors, and audiences, as well as the tendency to study behaviors without consideration of social and cultural mediation (see e.g., Trueba, Rodriguez, Zou, & Cintron, 1993; Wertch, 1991). Such questions clearly pose a challenge for effectiveness research. 36

PRE-ANALYSIS STATISTICAL POWER Pre-analysis statistical power estimation is a recommended technique. The following steps will allow you to consider statistical power in research planning. 1. Estimate effect size from past research and the type of experimental design planned. When you are unsure, underestimate effect size so as to overestimate power. Also, in quasi-experimental or ex post facto circumstances, it is usually best to estimate a small effect size unless otherwise indicated. 2. Decide on exact statistical test and significance criterion. 3. Determine acceptable level of power,.80 is nice but.70 may be acceptable in some circumstances. 37 PRE-ANALYSIS STATISTICAL POWER Cont. 4. Use power tables for that statistical test or an appropriate computer program to determine the number of subjects required for the specified significance criterion and desired level of power. 5. If you have a fixed number of subjects, consider adjusting the significance criterion (alpha) or statistical design if necessary to obtain adequate power. Recall, the.05 significance criterion is not sacred, especially when it results in a power of less than.30 (i.e., less than a 30% chance of finding differences that actually exist). (Szymanski & Parker, 1992) 38 ALPHA INFLATION Multiple comparisons can increase alpha, the probability of a Type I error. Recall the fish story. The probability of a Type I error escalates with the number of comparisons made in the study. The experiment-wise alpha is computed as: 1-(1-alpha)n 39

ALPHA INFLATION As we discussed, one way to guard against alpha inflation is to use a Bonneferoni-type procedure and to split alpha by the number of comparisons. There are a variety of such procedures that can be used (see e.g., Marasciulo & Serlin, 1988) according to the relative importance of the tested hypotheses. The problem with reducing alpha is that it inflates beta. In situations in which alpha inflation is accepted due to a problem with power, one must look to replications for confidence in the findings. Again, let us consider Monet. One study, alone, tells us little. However, one study, considered in relationship to others, tells us about patterns or trends in the relationships among variables. 40 See Power Tables 41 Test Statistics, Probability, and Significance In order to test a hypothesis, we compare the obtained value of a test statistic (e.g., the obtained F) to a critical value of the test statistic (e.g., a critical F) that is associated with the preset significance level (alpha). If the obtained value of the test statistic is greater than the critical value, we determine that there is a significant difference or relationship. 42

Test Statistics, Probability, and Significance Cont. Test Statistic: The specific statistic (i.e., the tool) that is chosen to test the null hypothesis. Examples include F, t, r. Obtained Value: The actual value obtained when applying the test statistic to the data of interest. The probability value associated with the obtained value is p. Critical Value: The critical value of the test statistic that is associated with the chosen significance level (alpha). If the obtained value is greater that the critical value, the result is significant. 43 Test Statistics, Probability, and Significance Cont. Probability Value: The probability that observed relationships or differences are due to chance. Alpha: Alpha is also known as significance level or rejection region. It is the level of probability set by the researcher as grounds for rejection of the null hypothesis (Williams, 1986, p. 58). Alpha is the probability level associated with the critical value of the test statistic. In other words, alpha is our predetermined risk that differences that we declare to be real are actually due to chance. Obtained: This is also known as the obtained probability (p): significance of the test statistic. It is the probability that the data could have arisen if Ho were true (Cohen, 1994, p. 998). 44 Test Statistics, Probability, and Significance Cont. Significance: What happens when the obtained probability p is less than our predetermined alpha. Significance also occurs when the obtained value of the test statistic is greater than the critical value of the test statistic. Test Statistic Probability Value Critical Value Significance Level (alpha) Obtained Value Obtained or Actual Probability (p) Note that larger obtained values of test statistics are generally related with smaller values of p. If Obtained Value > Critical Value, then * Significance * If p < Alpha, then * Significance * 45

Revisit the Bell Curve 46 Test Statistics, Probability, and Significance Cont. WHETHER YOU ARE LOOKING AT OBTAINED VALUES OF TEST STATISTICS IN RELATION TO CRITICAL VALUES OR YOU ARE LOOKING AT ACTUAL PROBABILITY LEVELS, IT IS IMPORTANT TO NOTE THAT TEST STATISTICS AND THEIR ASSOCIATED PROBABILITIES ONLY TELL US THE PROBABILITY THAT A DIFFERENCE OR RELATIONSHIP OCCURRED BY CHANCE. THESE STATISTICS DO NOT TELL US THE SIZE OF GROUP DIFFERENCES OR THE STRENGTH OF RELATIONSHIPS 47 Steps in Hypothesis Testing for Quantitative Research Designs Hypothesis testing is a 4 phase procedure: Phase I: Research Hypotheses, Design, and Variables Phase II: Statistical Hypotheses Phase III: Hypotheses Testing Phase IV: Decision/Interpretation 48

Phase I: Research Hypotheses, Design, and Variables 1. State your research hypotheses. 2. Decide on a research design based on your research problem, your hypotheses, and what you really want to be able to say about your results (e.g., if you want to say that A caused B, you will need an experimental or time-series design; if probable cause is sufficient, a quasi-experimental design would be appropriate). 3. Operationally define your variables. Recall that one variable can have more than one operational definition. 49 Phase II: Statistical Hypotheses 1. Consider your chosen statistical procedures. 2. Write one statistical null hypotheses for each operational definition of each variable that reflects that statistical operations to be performed. 50 Phase III: Hypotheses Testing Complete the following steps for each statistical null hypothesis: 1. Select a significance level (alpha). 2. Compute the value of the test statistic (e.g., F, r, t). 3. Compare the obtained value of the test statistics with the critical value associated with the selected significance level or compare the obtained p-value with the pre-selected alpha value. 4. If the obtained value of the test statistic is greater than the critical value (or if the obtained p-value is less than the pre-selected alpha value), reject the null hypothesis. If the obtained value is less than the critical value of the test hypothesis, fail to reject the null hypothesis. Another way of looking it: If p is less than or equal to alpha, reject the null hypothesis. 51

Phase IV: Decision/Interpretation 1. For each research hypothesis, consider the decisions regarding the statistical null hypotheses. 2. For each research hypothesis, consider qualitative contextual information relating potential plausibility. 3. Cautiously explain your findings with respect to the research hypotheses. 4. List and discuss the limitations (threats to valid inference). Note: Null hypothesis testing is currently under scrutiny (see e.g., Cohen, 1994; Kirk, 1996). It is generally recommended that you report the effect size along with the value of the test statistic and the p-value. An alternative is to report confidence intervals. 52 Points to Consider about Hypotheses Testing FISHING IN LAKE ALICE We don t prove the null hypothesis. If you go fishing on Lake Alice and you don t catch fish, you cannot conclude that there are no fish in the lake!!! 53 Points to Consider about Hypotheses Testing Cont. What circumstances might keep us from finding fish in the lake? Possible problems include: 1. wrong or insensitive outcome measures (using the large net for small fish), 2. sampling problems (looking in the wrong part of the lake), or 3. methodological problems (scooping the top of the lake rather than the bottom, where the fish hang out). 54

Points to Consider about Hypotheses Testing Cont. Returning to hypothesis testing: Failure to reject the null hypothesis cannot be interpreted as proof that no differences or relationships exist. Existing differences or relationships might be obscured by: 1. insensitive outcome measures (the wrong fishnet), 2. inappropriate statistical designs, 3. poor sampling strategies, and 4. low statistical power. 55 Questions about Hypotheses? 56 Class Activity 57