The Basics of Experimentation Overview of Experiments. IVs & DVs. Operational Definitions. Reliability. Validity. Internal vs. External Validity. Classic Threats to Internal Validity. Lab: FP Overview; Help on OBS Project &/or Assign 8. Overview of Experimentation Experiments are the most powerful research methods in science because, if done correctly (if high in internal validity) they allow us to establish cause & effect. Establishing Cause & Effect is the first step toward Explanation & Control. The Main Components of Any Experiment: 1. Statement of a Hypothesis. 2. Random Assignment of Participants to Conditions. 3. Manipulation of Antecedent Conditions (IVs). 4. Measurement of Behavior (DVs). 5. (Statistical) Analysis of Results. Overview of Experimentation A hypothesis is a (concrete, testable, falsifiable) statement about the relation between two or more variables. [If IV, then DV]. Independent Variable (IV): What the E manipulates. Examples include degree of anxiety, noise level, amount of training, time between study & test, color of survey, position of product, type of therapy, amount of counseling, etc. etc. etc. Levels of an IV: The different values of the variable. All IVs have at least two levels and often more. 1 IV: Degree of Anxiety; 2 Levels: Calm, Anxious. 1 IV: Degree of Anxiety; 3 Levels: Low, Moderate, High. 1 IV: Color of Survey; 4 Levels: White, Red, Blue, Yellow. 1 IV: Noise Level; 5 Levels: 20dB, 40dB, 60dB, 80dB, 100dB. 1 IV: Type of Therapy; 2 Levels: Drug, Cognitive-Behavioral. 1
Overview of Experimentation Dependent Variable (DV): What the E measures. Examples include % errors, % correct responses, exam score, degree of affiliation, level of extroversion, # of aggressive acts, change in academic performance, sales, etc. DVs do not have levels (although one might measure more than one thing; or one thing in more than one way). Operational Definitions: Defining variables in terms of specific/ concrete operations or procedures, leaving no room for guesswork or interpretation, so that they can be reproduced by another (naive) E. Experimental Operation Definitions : The precise procedures involved in implementing and/or manipulating an IV. Measured Operation Definitions: The precise procedures involved in measuring a DV. Reliability & Validity Reliability & Validity are at the heart of measurement. Reliability refers to whether a measure is consistent. Is my measuring instrument (test, survey, scale, etc.) dependable? Can I count on it to give the same scores over people, items, & time? In general, reliability is assessed via correlational measures that are interpreted similar to Pearson correlations. Validity refers to whether an instrument (test) truly measures the variable we think it measures (or want it to measure). Is my measurement (or IQ, Depression, Memory, etc.) really getti ng at the construct I'm interested in? In general, validity is also assessed via correlational measures, but usually in more sophisticated (theory-based) ways than those associated with reliability. Without reliable & valid measures there's no reason to do experiments. Reliability Inter-rater Reliability: Do two or more people produce the same ratings or measurements? Often assessed via (a) average % agreement, (b) Cohen's Kappa; or (c) average r over raters. Inter-item Reliability (aka. Internal Consistency): Do different parts of a test produce similar measurements? Split-half technique: Correlation between two halves of a test or survey (e.g., 1st & 2nd half; odd & even items). Cronbach's Alpha ( ): Average r among all possible split-halves. Test-Retest Reliability: Does a test produce similar measurements over time (e.g., at Time 1 & Time 2)? Treated as a special case of split-half; assessed as a correlation between Test 1 & Test 2 scores. 2
Face & Content Validity Face & Content Validity are non- empirical (non-data-based) forms of validity, based on argument and theory. Face Validity: Does my instrument (test, survey, question, etc.) look or feel like it's measuring what it's supposed to measure? Measures with high face validity may not always provide truly valid indices of the construct we are trying to measure (e.g., survey questions about racism/prejudice; cross-word puzzles as IQ tests). Measures with low face validity may can provide excellent measures of the construct we are after (e.g., Perceptual-Motor Speed or Raven's Progressive Matrices as indices of IQ). Content Validity: Is my instrument capturing the entirety of the construct I'm trying to measure? The extent to which the measure (a) includes relevant aspects of the construct and (b) exclude irrelevant aspects of the construct. E.g., Does the SAT measure more than academic achievement? Construct Validity Construct Validity looks at how well a measure (or factor) captures the relevant aspects of a construct and excludes irrelevant aspects. It can be thought of as the hard-core, data-based version of content validity. Convergent Validity: Does my instrument correlate with other instruments designed to measure the same construct? Does my measure of Intelligence (Raven's, cross-word puzzle ability) correlate with other measures of intelligence (WAIS)? Divergent Validity: Does my instrument not correlate with other instruments designed to measure different constructs? Does my measure of Intelligence (Raven's, cross-word puzzle ability) correlate with measures of memory, personality, athletic ability etc.? Criterion Validity Criterion Validity looks at the degree to which a measure (or factor) is related to other (often real-world) outcomes. It can be thought of as the hard-core, data- based version of face validity. Concurrent Validity: Does my instrument predict (correlate with) a currently available outcome related to the construct? Does my measure of Intelligence (Raven's, SAT, cross-word puzzle ability) correlate with current GPA? Predictive Validity: Does my instrument predict (correlate with) a future outcome related to the construct? Does my measure of Intelligence (Raven's, SAT, cross-word puzzle ability) correlate with GPA at graduation (or future Job/Pay level)? 3
Different Kinds of Validity Validity Non-Data-Based Data-Based Face Content Construct Criterion Convergent Divergent Concurrent Predictive Internal & External Validity Up until now we have been discussing the validity of measurements. Internal & External Validity are concerned with the validity of an experiment as a whole. Internal Validity. The degree to which a research design allows you to make causal statements (or draw firm conclusions). Well-designed experiments have high internal validity. Experiments with confounds have low internal validity. External Validity. The degree to which research findings generalize to people or situations outside the research setting. As a general rule, high external validity (generalization) will be dependent on high internal validity (a welldesigned experiment). Classic Threats to Internal Validity The internal validity of an experiment can be lowered by many things (e.g., bad design, noisy room, etc.). However, there are eight "classic" threats that should be especially guarded against. 1. History: Any outside event that affects the DV. Especially a problem when multiple measures are taken over time. or when different groups of subjects are tested at different points in time. 2. Maturation: Any physical or psych change that affects the DV. Mainly a problem when measures are taken over time (pre-/post-testing). Examples include fatigue, boredom, increased knowledge, etc. 3. Testing: Any change in performance due to prior test experience. Problematic when measures are taken over time (e.g., Within-Ss Designs). Examples include practice effects & test pre-sensitization. 4. Instrumentation: Changes in a measurement device; or in the criteria used by observers for recording behavioral events. E.g.'s: Changes in the sensitivity of a button; Different criteria for "violence". 4
Classic Threats to Internal Validity 5. Regression to the mean: Movement away from an extreme value toward the mean value. Mainly a problem measuresover-time (pre/post) problem. 6. Selection: Anything that results in non-equivalent groups being exposed to different treatment conditions. Ex Post Facto Designs; Non-random assignment; Self-selection; etc. 7. Attrition (aka. Mortality): Differential loss of subjects from particular treatment groups or conditions. Often a problem in drug or aging studies; or when more difficult or boring conditions are compared to easier or more interesting conditions. 8. Selection Interactions: When any of the above threats affects one treatment group more than another. E.g., History or Test effects on Males vs. Females or Young vs. Old. Final Project Overview I The Final Project for this course is a Paper [50 points] & Presentation [20 points] based on a 2x2 factorial experiment [2 IVs with 2 levels each]. Experiments will be designed & conducted by teams of 4 students. Topic ideas will be handed out in class. I strongly encourage each team to (a) get started early in designing and setting up their experiment; and (b) conduct test runs of their experiment before actual data is collected. Two class periods have been set aside for data collection (see course Schedule). You may also collect data outside of these periods, but All experiments must be reviewed & cleared by me before any data is collected. Final Project Overview II Paper: A complete report of your experiment. Papers should (a) strictly follow APA format, (b) use the hourglass method, & (c) include at least three references of papers you have read. Grading will be based on the format of your report as well as its content (including stats). More detailed guidelines to grading will be posted on the web. Presentation: Each team will give a brief (10-15 minute) presentation of their experiment to the class on Dec. 4. Presentations should be in PowerPoint and include an Intro, Method, Results & Discussion, with each member presenting one section. To receive full credit, PowerPoint Presentations must be mailed to MethodsTA@yahoo.com by 5pm on Dec. 4. To receive full credit, Hard copies of your final report must be in my mailbox by 5pm on Dec. 6. 5