Dr. Allen Back Sep. 30, 2016
Extrapolation is Dangerous
Extrapolation is Dangerous And watch out for confounding variables. e.g.: A strong association between numbers of firemen and amount of damge at a fire does not mean firemen cause
Extrapolation is Dangerous High Leverage Point: A data point (x i, y i ) with x i far from x.. Consequently the point might (depending on the actual value of y i ) have a large impact on the line of regression.
Was it Fair? The first draft lottery during the Vietnam War: 366 balls labeled by dates. Mixed up and pulled out in a random order.
Was it Fair? Scatterplot
Was it Fair? Boxplots for each month
Was it Fair? Scatterplot with Line
Was it Fair? Correlation Display
Was it Fair? Correlation Display Around 1 in a thousand chance of a correlation coefficient this far from 0 if the lottery was fair.
Was it Fair? Around 1 in a thousand chance of a correlation coefficient this far from 0 if the lottery was fair. The balls were probably not mixed well enough.
How Many Rooms Can x Clean? x crews working for a building contractor go out each night and clean y rooms. Understand the relationship?
How Many Rooms Can x Clean? Scatterplot
How Many Rooms Can x Clean? Num summary
How Many Rooms Can x Clean? RoomsCleaned Summary
How Many Rooms Can x Clean? Scatterplot with Line
How Many Rooms Can x Clean? Display
How Many Rooms Can x Clean? Display
How Many Rooms Can x Clean? Display RoomsCleaned = 3.70 Num + 1.78
How Many Rooms Can x Clean? Residual Plot
How Many Rooms Can x Clean? There are important deviations from the the assumptions of an ideal linear regression model here.
Highlight of and Distance The slope b 1 of Fare = b 1 Distance + b 0 is the average increase in fare per extra mile. Fare = 177 +.079 Distance and Distance = 644 + 6.13 Fare are different lines! (Note 1.079 6.13.) If you want to compute r on a TI-83/84, the place to look is stat calc linreg. And ONCE, you need to set DiagnosticsOn in the Catalog.
Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2.
Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. My view: Right psychologically but unclear at first glance what it means.
Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. What it actually means is Var(ŷ i ) Var(y i ) = r 2 where the variances refer to the 1 variable data sets {y i } and {ŷ i }.
Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. My view: Right psychologically but unclear at first glance what it means. My view: The companion statement Var(Residuals) Var(y i ) = 1 r 2 does really explain why r 2 near 1 says something important about the quality of the approximation offered by the regression model.
by Locality (rm outliers?, transform?) vs Housing Prices in 1996 Crime Rate is Crimes Per 1000 Housing Prices in Dollars
by Locality (rm outliers?, transform?) scatterplot
by Locality (rm outliers?, transform?) with regression line ĤP = 577 CR + 177K r 2 =.06 (SMALL)
by Locality (rm outliers?, transform?) regression display
by Locality (rm outliers?, transform?) Residuals
by Locality (rm outliers?, transform?) Now analyze without the Center City Outlier
by Locality (rm outliers?, transform?) scatterplot
by Locality (rm outliers?, transform?) with regression line ĤP = 2290 CR + 225K r 2 =.18 (vs..06 before)
by Locality (rm outliers?, transform?) regression display
by Locality (rm outliers?, transform?) Residuals
by Locality (rm outliers?, transform?) Now transform from CR to 1 CR.
by Locality (rm outliers?, transform?) scatterplot
by Locality (rm outliers?, transform?) with regression line ĤP = 1.3M But Center City included. 1 CR + 97.9K r 2 =.17
by Locality (rm outliers?, transform?) regression display
by Locality (rm outliers?, transform?) Residuals
For both men and women: 1 IQ s average about 100 2 SD about 15
A large study showed: 1 For men with IQ of 140, average wife s IQ was 120. 2 For women with IQ of 120, average husband s s IQ was 110. 3 Note the Z score of 140 is twice the Z score of 120. 4 The above kind of comparison is typical because of the two regression lines.
e.g. if r =.5, 1 Ẑ w = rz m, Z m = 2.667 Ẑ w = 1.333. 2 Ẑ m = rz w, Z w = 1.333 Ẑ m =.667.
Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54
Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46
Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46 NFIP result confusing, but PHS not.
Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46 NFIP result confusing, but PHS not. Randomized control groups help a lot with unanticipated issues!
Portacaval Shunt Studies 51 Studies Enthusiasm: Design Marked Moderate None No Controls 24 7 1 Controls, not randomized 10 3 2 Randomized controls 0 1 3
Gilbert 75 28 Social and Medical Interventions ++ 21% + 21% 0 46% - 7% 4%
Gilbert 77 36 Surgical and Anaesthetic Innovations innovation highly preferred 14% innovation preferred 19% innovation a success but not much better 11% innovation a disappointment but not much worse 28% standard preferred 6% standard highly preferred 11%
Establishing Association strong. (Attempts)
Establishing Association strong. Association consistent. (Attempts)
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses.
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect.
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible.
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations.
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations. This is hard to do reliably.
Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations. This is hard to do reliably. is much clearer!
Basic Strategies 1) Control extraneous sources of variation.
Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation.
Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation. 3) Replicate to increase accuracy and gain greater confidence in the scope of your conclusions.
Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation. 3) Replicate to increase accuracy and gain greater confidence in the scope of your conclusions. 4) Block when possible to increase accuracy/sensitivity and better control variability.
Sampling Words Sample vs. Population Sample Statistic vs. Population Parameter Sampling Frame (not in your text?) Voluntary Response Sample (not in your text?) Convenience Sample Biased Sample Simple Random Sample (SRS)
Sampling Words Census Strata Stratified Random Sample Cluster Sample Multistage Sample Design
Sampling Words Matching in an observational study cohort Undercoverage (not in your text?) Non-Response Bias Response Bias (not in your text?) Leading Questions Sampling Variability
Stratification Strata groups of homogeneous individuals. Stratified Random Sample same probability of choice within each group.
Stratification Strata groups of homogeneous individuals. Stratified Random Sample same probability of choice within each group. Advantages include: Every stratum well represented. Can be more accurate for a given sample size. Strata with greater variability should be better represented.
Types of Bias Response bias vs. voluntary response bias vs. non-response bias?
Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Response Bias: problems in the questions or how they are asked.
Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Voluntary Response Bias: problems in surveys where only volunteers participate.
Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Non-Response Bias: problems associated with which people are missing in the final results.
Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Undercoverage: groups somewhat missing from the sampling frame.
s Observational Study vs. Prospective vs Retrospective Study Factor in an experiment Level Treatment
s Control Group Single-Blind vs. Double-Blind One Factor vs. Two Factor Placebo Placebo Effect
s Block Block Design Matched Pairs Design Confounding Variables Statistically Significant Effect
Factors and Levels Factors vs. Levels vs. Treatments?
Factors and Levels Factors vs. Levels vs. Treatments? Factor in an : Variable being manipulated.
Factors and Levels Factors vs. Levels vs. Treatments? Levels: Values of a factor.
Factors and Levels Factors vs. Levels vs. Treatments? Treatment: What is actively done to the experimental units.
Block Related Block vs. Block Design vs. Matched Pairs Design
Block Related Block vs. Block Design vs. Matched Pairs Design Block: homogenous group similar in some important way.
Block Related Block vs. Block Design vs. Matched Pairs Design Block Design: random within each block.
Block Related Block vs. Block Design vs. Matched Pairs Design Matched Pairs Design: block size of 2.