Experimental design (continued) Spring 2017 Michelle Mazurek Some content adapted from Bilge Mutlu, Vibha Sazawal, Howard Seltman 1
Administrative No class Tuesday Homework 1 Plug for Tanu Mitra grad student session 2
Today s class Finish threats to validity Experimental design / choices Alternatives to experiments 3
Quick review Internal validity: causality Isolate variable of interest Randomized assignment External validity Representative sample Representative environment/task/analysis Valid constructs Measure something meaningful Reliable 4
Know what you re measuring Especially when dealing with large-scale data from the internet What are you missing? What is duplicated? What is the precision and accuracy of the data? Are you capturing what you think you re capturing? *Vantage point* Representativeness / diversity 5
Calibrating constructs Examine outliers and spikes Check for self-consistency Compare multiple measures Multiple datasets Multiple ways of calculating a value Test with synthetic data Check longitudinal data periodically! 6
Mis-measurements, now what? Discard? (Why might this be bad?) Discard outliers? Definition? Use an explicit adjustment? 7
Other measurement notes (Don t really fit here, but from Paxson paper) Metadata and good analysis logging is critical! Be clear about unknowns and limitations 8
4. Power Power: Likelihood that if there s a real effect, you will find it. Why might you not find it? Sample size Effect size Missing explanatory variables Variability 9
Promote power Covariates: Measure possible confounds, include in analysis Use reliable measurements Control the environment Potential tradeoff: Generalizability for power E.g., limit variability between subjects http://4.bp.blogspot.com/-fuha1- JXxto/T_ssNrNODtI/AAAAAAAAAo0/LXcl0Pxzg40/s1 10
EXPERIMENTAL DESIGN 11
Some important decisions What is the hypothesis? Between or within subjects? What treatment levels / conditions? What dependent variables to measure? 12
Good hypothesis design Predicted relationship between (at least) 2 vars Testable, falsifiable Operational Vars are clearly defined Relationship / how you measure it clearly defined 13
Good hypothesis design (cont.) Justified Exploratory results Theory in related area Well justified intuition? Parsimonious 14
Between vs. Within Between: Each participant belongs to exactly one condition Within: Each participant belongs to multiple 15
Between vs. Within More participants Cleaner/less bias More time each More power (less variability subj-subj) 16
Improving on between-subjects Matching: Get like participants for each condition Pro: reduces variability Con: Hard to find; what do you match on? In general, be very cautious 17
Improving on within-subjects Ordering effects can be HUGE Learning, fatigue Range effects: learn most for closest conditions Mitigate via counterbalancing All possible orders Balanced latin square A B C D C A D B B D A C D C B A 18
Counterbalancing doesn t fix: Range effects (most average treatment) Context effects (what most participants are more familiar with) 19
Mixed models are also possible Everyone gets the same three tasks Order of tasks varies Tool with which to execute tasks varies 20
Selecting conditions How many IVs? Password meter example How many / which levels for each? Cannot infer anything about levels you didn t test 21
Full-factorial (or not) Full-factorial: All possible combinations of all Ivs And all orderings? Not: Only a subset Selected how? Recall: Vary at most one thing each time! Planned comparisons! 22
Why multivariate? What is different between running one experiment with two IVs vs. two experiments with one IV each? Interaction effects! 23
Dependent variables What and how to measure? Construct validity, again! Performance (time, errors, FP/FN, etc.) Opinions/attitude Audio recording, screen capture, keystrokes, copypasting behavior, etc. Demographics Multiple measures toward higher-level construct? 24
NOT JUST EXPERIMENTS 25
Kinds of measurement studies Experimental Observational/correlational Quasi-experimental 26
Observational/correlational Observe that X and Y (don t) increase and decrease together / in opposition Research doesn t apply any control or treatment: just measure incidence Does lead exposure correlate with crime rate? Directionality and third-variable both issues 27
Quasi-experiments Subset of observational studies Can t randomize assignment But, experimenter controls something Group 1 Group 1 Treatment Group 2 Group 2 28
Observational examples Cohort study Regression discontinuity BIBIFI example 29
Pluses and minuses Can measure things that simply can t be done with true experiments In general, association at best causality very hard to establish Some statistical techniques to help exist Low internal validity can you maximize it within the available constraints? 30