Lecture Notes Module 2

Similar documents
PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Chapter 11. Experimental Design: One-Way Independent Samples Design

Experimental Psychology

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Samples, Sample Size And Sample Error. Research Methodology. How Big Is Big? Estimating Sample Size. Variables. Variables 2/25/2018

Student Performance Q&A:

Creative Commons Attribution-NonCommercial-Share Alike License

Applied Statistical Analysis EDUC 6050 Week 4

Module 28 - Estimating a Population Mean (1 of 3)

Examining differences between two sets of scores

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

ANOVA in SPSS (Practical)

V. Gathering and Exploring Data

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Confidence Intervals On Subsets May Be Misleading

Higher Psychology RESEARCH REVISION

Unit 1 Exploring and Understanding Data

One slide on research question Literature review: structured; holes you will fill in Your research design

Variables Research involves trying to determine the relationship between two or more variables.

2 Critical thinking guidelines

PSYCHOLOGY 320L Problem Set #4: Estimating Sample Size, Post Hoc Tests, and Two-Factor ANOVA

Research Methodology. Characteristics of Observations. Variables 10/18/2016. Week Most important know what is being observed.

9 research designs likely for PSYC 2100

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Homework Exercises for PSYC 3330: Statistics for the Behavioral Sciences

Study Guide for the Final Exam

UNIT II: RESEARCH METHODS

Previous Example. New. Tradition

Evaluation: Scientific Studies. Title Text

Comparing Two Means using SPSS (T-Test)

Business Statistics Probability

Chapter 5: Field experimental designs in agriculture

Lecture 20: Chi Square

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Final Exam Practice Test

Final Exam: PSYC 300. Multiple Choice Items (1 point each)

Confounding and Bias

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

Variability. After reading this chapter, you should be able to do the following:

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

10 Intraclass Correlations under the Mixed Factorial Design

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

YSU Students. STATS 3743 Dr. Huang-Hwa Andy Chang Term Project 2 May 2002

Overview. Survey Methods & Design in Psychology. Readings. Significance Testing. Significance Testing. The Logic of Significance Testing

Unit 7 Comparisons and Relationships

Investigating the robustness of the nonparametric Levene test with more than two groups

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

CHAPTER LEARNING OUTCOMES

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

26:010:557 / 26:620:557 Social Science Research Methods

Lesson 11.1: The Alpha Value

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Final Exam PS 217, Fall 2010

ID# Exam 2 PS 217, Fall 2010

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Selecting Research Participants. Conducting Experiments, Survey Construction and Data Collection. Practical Considerations of Research

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

EXPERIMENTS IN RESEARCH

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

Psychology Research Process

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Chapter 2 Methodology: How Social Psychologists Do Research

baseline comparisons in RCTs

Bayesian and Frequentist Approaches

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Chapter 1 Applications and Consequences of Psychological Testing

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Before we get started:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

12.1 Inference for Linear Regression. Introduction

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

BIOSTATISTICAL METHODS

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

Comparison of two means

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Experimental Methods. Anna Fahlgren, Phd Associate professor in Experimental Orthopaedics

Where does "analysis" enter the experimental process?

Sheila Barron Statistics Outreach Center 2/8/2011

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Methods for Determining Random Sample Size

Research Methods in Psychology UNIT 3 PSYCHOLOGY 2013

Purpose. Study Designs. Objectives. Observational Studies. Analytic Studies

2-Group Multivariate Research & Analyses

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Power & Sample Size. Dr. Andrea Benedetti

Introduction to Statistical Data Analysis I

Transcription:

Lecture Notes Module 2 Two-group Experimental Designs The goal of most research is to assess a possible causal relation between the response variable and another variable called the independent variable. In experimental designs, the response variable is usually called a dependent variable. Three basic conditions must be satisfied to demonstrate a causal relation between a dependent variable and an independent variable. First, there must be a relation between the dependent variable and the independent variable. Second, the observed change in the dependent variable must have occurred after there was a change in the independent variable. Third, no variable other than the independent variable can be responsible for the relation between the dependent variable and the independent variable. An experiment can be used to assess a causal relation. The simplest type of experiment involves just two treatment conditions that represent the levels of the independent variable. In a two-group experiment, a random sample of n participants is selected from a study population. The random sample is then randomized (i.e., randomly divided) into two groups, and each group receives one of the two treatments with participants treated identically within each group. If one group does not receive any treatment, it is called a control group. Following treatment, a measurement on the dependent variable is obtained for each participant. In a two-group experiment with a quantitative dependent variable, a population mean could be estimated from each group. In an experimental design, the population means have interesting interpretations: μ 1 is the population mean of the dependent variable, assuming all participants in the study population had received level 1 of the independent variable, and μ 2 is the population mean of the dependent variable, assuming all participants in the study population had received level 2 of the independent variable. The difference in population means for the two treatment conditions, μ 1 μ 2, is called the effect size and describes the strength of the relation between the dependent and independent variables. In an experiment, a nonzero effect size is evidence that the independent variable has a causal effect on the dependent variable because all three conditions required for a causal association will have been satisfied: 1) a nonzero effect size implies a relation between the dependent and independent variables, 2) the change in the dependent variable occurred after the change in the independent variable, and 3) because the participants 1

were randomized into the levels of the independent variable, no other variable could have caused the nonzero effect size. A confidence interval for μ 1 μ 2 provides information about the direction and magnitude of the effect size. Confidence Interval for Mean Difference A 100(1 α)% confidence interval for μ 1 μ 2 is μ 1 μ 2 ± t α/2;df SE μ 1 μ 2 (2.1) where t α/2;df is a critical t-value, SE μ 1 μ 2 = σ p2 /n 1 + σ p2 /n 2 is estimated the standard error of μ 1 μ 2, df = n 1 + n 2 2 and σ p2 = [(n 1 1)σ 12 + (n 2 1)σ 22 ]/ (n 1 + n 2 2). The standard error in Equation 2.1 is called a pooled-variance standard error because it assumes equal population variances and uses a pooled estimate of the common population variance. In an experiment, recall that all participants within a particular treatment group should be treated identically. The within-group variance estimates, σ 12 and σ 22, represent unexplained variability in the dependent variable. The within-group variance is also referred to as error variance. Example 2.1. A psychologist believes that it is important for 2 nd grade students to overlearn the multiplication tables so that these computations can be made rapidly and without thought when students later begin working on more complex math problems. A population of 948 2 nd grade students was identified in a particular school district, and 80 students were randomly selected from this study population. The 80 students were randomized into two groups of equal size. The first group was a control group and received no additional multiplication table training. The second group received 15 minutes per day of extra multiplication tables training for 60 days. At the end of the 60 day training period, all 80 students were given a multiplication test and the time (in seconds) to complete the test was recorded for each student. The sample means and standard deviations are given below. Group 1 Group 2 μ 1 = 273.6 μ 2 = 112.8 σ 1 = 27.2 σ 2 = 20.8 The 95% confidence interval for μ 1 μ 2 is 273.6 112.8 ± t.05/2;df 586.2 + 586.2 40 40 = [150.0, 171.6] where df = 40 + 40 2 = 78, t.05/2;78 = 2.00, and σ p2 = [(39)27.2 2 + (39)20.8 2 ]/78 = 586.2. The psychologist is 95% confident that in the study population of 948 2 nd grade students, the average time to complete the multiplication tables test would be 150.0 to 171.6 seconds faster if they had all received the extra math training for 60 days. 2

Hypothesis Testing The confidence intervals for μ 1 μ 2 described above can be used to test hypotheses. For instance, a confidence interval for μ 1 μ 2 may be used to implement a three-decision rule for the following hypotheses. H0: μ 1 = μ 2 H1: μ 1 > μ 2 H2: μ 1 < μ 2 If the lower limit for μ 1 μ 2 is greater than 0, then reject H0 and accept H1: μ 1 > μ 2 If the upper limit for μ 1 μ 2 is less than 0, then reject H0 and accept H2: μ 1 < μ 2. If the confidence interval includes 0, H0: μ 1 = μ 2 cannot be rejected (an inconclusive result) In a two-group design, the test of H0: μ 1 = μ 2 is commonly referred to as an independent-samples t-test and involves the computation of the test statistic t = (μ 1 μ 2)/SE μ 1 μ 2. Statistical packagers such as SPSS and R compute a p-value for the t statistic. If the p-value is less than α, then H0 is rejected and it is common to declare the results to be significant ; otherwise, the results are declared to be nonsignificant. Of course, a significant result does not imply that an important difference in population means has been detected, and a nonsignificant result does not imply that the null hypothesis is true. Most psychology journals now require authors to report the t-value, df, and p-value along with a confidence interval for μ 1 μ 2. Two-group Nonexperimental Designs The confidence interval for μ 1 μ 2 (Equation 2.1) also can be applied to nonexperimental designs where participants are classified into two groups according to some preexisting characteristic (male/female, democrat/republican, freshman/sophomore, etc.) rather than being randomly assigned to treatment conditions. In nonexperimental designs, the magnitude of μ 1 μ 2 describes the strength of a relation between the dependent variable and independent variable. In nonexperimental designs, an observed relation between the independent variable and the dependent variable cannot be interpreted as a causal relation because the relation may be due to one or more unmeasured variables called confounding variables that are related to both the dependent variable and the independent variable. For example, many nonexperimental studies have compared moderate alcohol drinkers with non-drinkers and found that moderate 3

drinkers live longer. However, moderate drinkers may differ from nondrinkers in education level, income, access to health care, moderation in consumption of unhealthy foods, and many other characteristics. It is possible that one or more of these confounding variables is responsible for the observed relation between alcohol consumption and life expectancy. Therefore, the nonexperimental finding that alcohol consumption is related to life expectancy does not imply that a nondrinker will live longer if that person begins to drink alcohol in moderation. In a nonexperimental design, the parameters also have a different interpretation. Specifically, μ 1 is the population mean of the dependent variable for all people in the study subpopulation who belong to one category (e.g., male, democrat, freshman), and μ 2 is the population mean of the dependent variable for all people in the study subpopulation who belong to the other category (e.g., female, republican, sophomore). The subtle but important parameter interpretation differences in experimental and nonexperimental designs will affect how the psychologist describes the results of a confidence interval or hypothesis test. Assumptions for Confidence Intervals and Tests The confidence interval for μ 1 μ 2 assumes: 1) random sampling, 2) independence among participants, 3) the dependent variable has an approximate normal distribution in the study population for each treatment condition or subpopulation, and 4) equal population variances for each treatment condition or subpopulation (the equal variance assumption). Violating the normality assumption will not be a concern if the sample sizes per group are not too small (n j > 20). Violating the equal variance assumption will not be a concern if n 1 and n 2 are not too dissimilar. However, the confidence interval for μ 1 μ 2 can perform very poorly when the population variances are unequal and the sample sizes are unequal. This problem is most serious when the smaller sample size is used in the treatment with the larger population variance. Both SPSS and R will compute a confidence interval for μ 1 μ 2 that uses a separate-variance standard error SE μ 1 μ 2 = σ 12 /n 1 + σ 22 /n 2 that does not require equal population variances, and this option should be used when one sample is considerably larger than the other sample. When the separate-variance standard error is used, the degrees of freedom is df = ( σ 1 2 than n 1 + n 2 2. + σ 2 2 n 1 n 2 ) 2 /[ σ 14 + n 2 1 (n 1 1) σ 2 4 n 2 2 (n 2 1) ] rather 4

A transformation of the dependent variable scores can reduce skewness and unequal variability between groups, but then μ 1 and μ 2 may become difficult to interpret. Interpretation difficulty is usually not an issue in hypothesis testing applications where the goal is to simply decide if μ 1 is less than μ 2 or if μ 1 is greater than μ 2. Sample Size Requirements The sample size requirement per group to estimate μ 1 μ 2 with desired confidence and precision is approximately n j = 8σ 2(z α/2 /w) 2 (2.2) where σ 2 is a planning value of the average within-group variance of the dependent variable for the two groups. This planning value can be specified using information from published research reports, a pilot study, or the opinions of experts. If prior estimates of the dependent variable variance are unavailable but the maximum and minimum values of the dependent variable are known, the planning value of the variance could be set to [(max min)/4] 2. Example 2.2. A psychologist wants to conduct a study to determine the effect of achievement motivation on the types of tasks a person chooses to undertake. The study will ask participants to play a ring-toss game where they try to throw a small plastic ring over an upright post. The participants will choose how far away from the post they are when they make their tosses. The chosen distance from the post is the dependent variable. The independent variable is degree of achievement motivation (high or low) and will be manipulated by the type of instructions given to the participants. The results of a pilot study suggest that the variance of the distance scores is about 0.75 2 in each condition. The psychologist wants a 99% confidence interval for μ 1 μ 2 to have a width of about 1 foot. The required sample size per group is approximately n j = 8(0.75 2 ) (2.58/1) 2 = 29.9 30. Unequal Sample Sizes Using equal sample sizes has two major benefits: if the population variances are approximately equal, confidence intervals are narrowest and hypothesis tests are most powerful when the sample sizes are equal, and the negative effects of violating the equal variance assumption are less severe when the sample sizes are equal. However, there are situations when equal sample sizes are less desirable. If one treatment is more expensive or risky than another treatment, the psychologist might decide to use fewer participants in the more expensive or risky treatment condition. Also, in experiments that include a control group, it might be easy and inexpensive to obtain a larger sample size for the control group. 5

Graphing Results The sample means for each group can be presented graphically using a bar chart. A bar chart for a two-group design consists of two bars, one for each group, with the height of each bar representing the value of the sample mean. Bar charts of sample means can be misleading because the sample means contain sampling error of unknown magnitude and direction. There is a tendency to incorrectly interpret the difference in bar heights as representing a difference in the population means. This misinterpretation can be avoided by graphically presenting the imprecision of the sample means with 95% confidence interval lines for each population mean, as shown in the graph below. Internal Validity Recall that one of the fundamental requirements for declaring a relation between two variables to be a causal relation is that the independent variable must be the only variable affecting the dependent variable. When this requirement is not satisfied, we say the internal validity of the study has been compromised. In nonexperimental designs, there will be many obvious confounding variables. However, in an experimental design, a confounding variable might go undetected. Consider the following example. Suppose a two-group experiment for the treatment of anxiety is conducted with one group receiving a widely-used medication and the second group receiving a promising new drug. Suppose a statistical analysis suggests that the new drug is more effective in reducing anxiety than the old drug. However, the psychologist cannot be sure that the new drug will cause an improvement in anxiety because patients who received the new drug also received extra safety precautions to monitor for possible negative side effects. These extra precautions involved more supervision and contact with the patients. It is possible that the additional supervision, and not the new drug, caused the improvement in the patients. 6

Differential attrition is another problem that threatens internal validity. Differential attrition occurs when the independent variable causes the participants in one treatment condition to withdraw from treatment with higher probability than participants in another treatment. With differential attrition, participants who complete the study could differ across treatment conditions in terms of some important attribute that would then be confounded with the independent variable. Consider the following example. Suppose a psychologist conducts an experiment to evaluate two different methods of helping people overcome their fear of public speaking. One method requires participants to practice with an audience of size 20 and the other method requires participants to practice with an audience of size 5. Fifty participants were randomly assigned to each of these two training conditions, but ten dropped out of the first group and only one dropped out of the second group. The results showed that public speaking fear was lower under the first method (audience size of 20) of training. However, it is possible that participants who stayed in the first group were initially less fearful than those who dropped out and this difference in initial fearfulness that resulted in lower fear scores in the first training condition. External Validity External validity is the extent to which the results of a study can be generalized to different types of participants and different types of research settings. In terms of random sampling, it is usually easier to sample from a small homogeneous study population than a larger and more heterogeneous study population. However, the external validity of the study will be greater if the psychologist samples from a larger and more diverse study population. Other ways to increase the external validity of a study will be discussed in Module 3. Nonrandom attrition occurs when certain types of participants, regardless of treatment condition, drop out of the study with a higher probability than other participants. With nonrandom attrition, the participants who complete the study are no longer a random sample from the original study population. The remaining participants could be assumed to be a random sample from a smaller study population of participants who would have completed the study. This change in the size and nature of the study population decreases the external validity of the study. With random attrition in both groups, the samples remain random sample from the original study population with no loss in external or internal validity. However, a random loss of participants will result in a loss of power and confidence interval precision due to the smaller sample size. 7

Ethical Issues Any study that uses human subjects should advance knowledge and potentially lead to improvements in the quality of life but the psychologist also has an obligation to project the rights and welfare of the participants in the study. These two goals are often in conflict and lead to ethical dilemmas. The most widely used approach to resolving ethical dilemmas is to weigh the potential benefits of the research against the costs to the participants. Evaluating the costs and benefits of a proposed research project that involves human subjects can be extremely difficult and this task is assigned to the Institutional Review Board (IRB) at most universities. Psychologists who plan to use human subjects in their research must submit a written proposal to the IRB for approval. The IRB carefully examines all proposals in terms of the following issues: Informed Consent Are participants informed of the nature of the study, have they explicitly agreed to participate, and are they allowed to freely decline to participate? Coercion to participate Were participants coerced into participating or offered excessive inducements? Confidentiality Will the data collected from participants be used only for research purposes and not divulged to others? Physical and mental stress Does the study involve more than minimal risk? Minimal risk is defined as risk that is no greater in probability or severity than ordinarily encountered in daily life or during a routine physical or psychological exam. Deception Is deception truly needed in the study? If deception is used, are participants debriefed? Debriefing is used to: 1) clarify the nature of the study to the participants, 2) reduce any stress or anxiety to the participants caused by the study, and 3) to obtain feedback from participants about the nature of the study. In addition to principles governing the treatment of human subjects, psychologists are bound by a set of ethical standards. Violation of these standards is called scientific misconduct. There are three basic types of scientific misconduct: Scientific dishonesty Examples include: the fabrication or falsification of data, and plagiarism. Plagiarism is the use of another person's ideas, processes, results, or words without giving appropriate credit. Unethical behavior Examples include: sexual harassment of research assistants or research participants, abuse of authority, failure to follow university or government regulations, inappropriately including or excluding authors on a research report or conference presentation, and providing a biased review of a manuscript or grant proposal. Questionable research practices Examples include: performing an exploratory analysis of many dependent and independent variables and reporting only the variables that yield a significant result, deleting legitimate data that adversely affect the desired result, and reporting an unexpected finding as if it had been predicted from theory. 8