Dr. Allen Back. Sep. 30, 2016

Similar documents
Dr. Allen Back. Oct. 7, 2016

Vocabulary. Bias. Blinding. Block. Cluster sample

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

Chapter 5: Producing Data

Chapter 1: Exploring Data

Unit 1 Exploring and Understanding Data

Examining Relationships Least-squares regression. Sections 2.3

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Sampling. (James Madison University) January 9, / 13

AP Stats Review for Midterm

Understandable Statistics

1.4 - Linear Regression and MS Excel

REVIEW FOR THE PREVIOUS LECTURE

Experimental Design There is no recovery from poorly collected data!

Chapter 3: Describing Relationships

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Chapter 1 - Sampling and Experimental Design

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Chapter 3. Producing Data

Experimental design. Basic principles

HPS301 Exam Notes- Contents

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Thursday, April 25, 13. Intervention Studies

Chapter 3. Producing Data

1. What is the difference between positive and negative correlations?

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

Still important ideas

Biostatistics for Med Students. Lecture 1

Math 124: Module 3 and Module 4

Thursday, April 25, 13. Intervention Studies

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

STAT 201 Chapter 3. Association and Regression

Chapter 3: Examining Relationships

Elementary Statistics and Inference. Elementary Statistics and Inference. 1.) Introduction. 22S:025 or 7P:025. Lecture 1.

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Unit 1 History and Methods Chapter 1 Thinking Critically with Psychological Science

Chapter 3 CORRELATION AND REGRESSION

Sta 309 (Statistics And Probability for Engineers)

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

AP Statistics. Semester One Review Part 1 Chapters 1-5

Pre-Calculus Multiple Choice Questions - Chapter S4

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Math 140 Introductory Statistics

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Biostatistics II

STATISTICS & PROBABILITY

11 questions to help you make sense of a case control study

Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems

Critical Appraisal Series

Study Design. Study design. Patrick Breheny. January 23. Patrick Breheny Introduction to Biostatistics (171:161) 1/34

CHAPTER 5: PRODUCING DATA

Sampling Controlled experiments Summary. Study design. Patrick Breheny. January 22. Patrick Breheny Introduction to Biostatistics (BIOS 4120) 1/34

CHAPTER ONE CORRELATION

Name: Class: Date: 1. Use Scenario 4-6. Explain why this is an experiment and not an observational study.

Math 1680 Class Notes. Chapters: 1, 2, 3, 4, 5, 6

Chapter 1 Data Collection

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Methodological skills

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

CRITICAL APPRAISAL SKILLS PROGRAMME Making sense of evidence about clinical effectiveness. 11 questions to help you make sense of case control study

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Research Methods. It is actually way more exciting than it sounds!!!!

UNIT 3 & 4 PSYCHOLOGY RESEARCH METHODS TOOLKIT

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

GATHERING DATA. Chapter 4

STA 291 Lecture 4 Jan 26, 2010

MAT 155. Chapter 1 Introduction to Statistics. Key Concept. Basics of Collecting Data. August 20, S1.5_3 Collecting Sample Data

Define the population Determine appropriate sample size Choose a sampling design Choose an appropriate research design

CHAPTER 4 Designing Studies

Psych 1Chapter 2 Overview

IAPT: Regression. Regression analyses

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Chapter 1 Thinking Critically with Psychological Science

Section 1.1 What is Statistics?

Chapter 4: More about Relationships between Two-Variables

Goal: To understand the methods that scientists use to study abnormal behavior

Chapter 8 Statistical Principles of Design. Fall 2010

UNIT II: RESEARCH METHODS

EXPERIMENTAL RESEARCH DESIGNS

The Practice of Statistics 1 Week 2: Relationships and Data Collection

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Chapter 13 Summary Experiments and Observational Studies

Transcription:

Dr. Allen Back Sep. 30, 2016

Extrapolation is Dangerous

Extrapolation is Dangerous And watch out for confounding variables. e.g.: A strong association between numbers of firemen and amount of damge at a fire does not mean firemen cause

Extrapolation is Dangerous High Leverage Point: A data point (x i, y i ) with x i far from x.. Consequently the point might (depending on the actual value of y i ) have a large impact on the line of regression.

Was it Fair? The first draft lottery during the Vietnam War: 366 balls labeled by dates. Mixed up and pulled out in a random order.

Was it Fair? Scatterplot

Was it Fair? Boxplots for each month

Was it Fair? Scatterplot with Line

Was it Fair? Correlation Display

Was it Fair? Correlation Display Around 1 in a thousand chance of a correlation coefficient this far from 0 if the lottery was fair.

Was it Fair? Around 1 in a thousand chance of a correlation coefficient this far from 0 if the lottery was fair. The balls were probably not mixed well enough.

How Many Rooms Can x Clean? x crews working for a building contractor go out each night and clean y rooms. Understand the relationship?

How Many Rooms Can x Clean? Scatterplot

How Many Rooms Can x Clean? Num summary

How Many Rooms Can x Clean? RoomsCleaned Summary

How Many Rooms Can x Clean? Scatterplot with Line

How Many Rooms Can x Clean? Display

How Many Rooms Can x Clean? Display

How Many Rooms Can x Clean? Display RoomsCleaned = 3.70 Num + 1.78

How Many Rooms Can x Clean? Residual Plot

How Many Rooms Can x Clean? There are important deviations from the the assumptions of an ideal linear regression model here.

Highlight of and Distance The slope b 1 of Fare = b 1 Distance + b 0 is the average increase in fare per extra mile. Fare = 177 +.079 Distance and Distance = 644 + 6.13 Fare are different lines! (Note 1.079 6.13.) If you want to compute r on a TI-83/84, the place to look is stat calc linreg. And ONCE, you need to set DiagnosticsOn in the Catalog.

Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2.

Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. My view: Right psychologically but unclear at first glance what it means.

Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. What it actually means is Var(ŷ i ) Var(y i ) = r 2 where the variances refer to the 1 variable data sets {y i } and {ŷ i }.

Highlight of and Distance Phrase about the regression of y on x: The proportion of the variance of y explained by the regression is r 2. My view: Right psychologically but unclear at first glance what it means. My view: The companion statement Var(Residuals) Var(y i ) = 1 r 2 does really explain why r 2 near 1 says something important about the quality of the approximation offered by the regression model.

by Locality (rm outliers?, transform?) vs Housing Prices in 1996 Crime Rate is Crimes Per 1000 Housing Prices in Dollars

by Locality (rm outliers?, transform?) scatterplot

by Locality (rm outliers?, transform?) with regression line ĤP = 577 CR + 177K r 2 =.06 (SMALL)

by Locality (rm outliers?, transform?) regression display

by Locality (rm outliers?, transform?) Residuals

by Locality (rm outliers?, transform?) Now analyze without the Center City Outlier

by Locality (rm outliers?, transform?) scatterplot

by Locality (rm outliers?, transform?) with regression line ĤP = 2290 CR + 225K r 2 =.18 (vs..06 before)

by Locality (rm outliers?, transform?) regression display

by Locality (rm outliers?, transform?) Residuals

by Locality (rm outliers?, transform?) Now transform from CR to 1 CR.

by Locality (rm outliers?, transform?) scatterplot

by Locality (rm outliers?, transform?) with regression line ĤP = 1.3M But Center City included. 1 CR + 97.9K r 2 =.17

by Locality (rm outliers?, transform?) regression display

by Locality (rm outliers?, transform?) Residuals

For both men and women: 1 IQ s average about 100 2 SD about 15

A large study showed: 1 For men with IQ of 140, average wife s IQ was 120. 2 For women with IQ of 120, average husband s s IQ was 110. 3 Note the Z score of 140 is twice the Z score of 120. 4 The above kind of comparison is typical because of the two regression lines.

e.g. if r =.5, 1 Ẑ w = rz m, Z m = 2.667 Ẑ w = 1.333. 2 Ẑ m = rz w, Z w = 1.333 Ẑ m =.667.

Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54

Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46

Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46 NFIP result confusing, but PHS not.

Polio Vaccine NFIP Vaccine Trials Size Rate (cases/100k) Grade 2 Vaccine 125K 25 Grade 2 No Consent 125K 44 Grade 1,3 Control 725K 54 PHS Double Blind Vaccine Trials Size Rate (cases/100k) Treatment 200K 28 Control 200K 71 No Consent 350K 46 NFIP result confusing, but PHS not. Randomized control groups help a lot with unanticipated issues!

Portacaval Shunt Studies 51 Studies Enthusiasm: Design Marked Moderate None No Controls 24 7 1 Controls, not randomized 10 3 2 Randomized controls 0 1 3

Gilbert 75 28 Social and Medical Interventions ++ 21% + 21% 0 46% - 7% 4%

Gilbert 77 36 Surgical and Anaesthetic Innovations innovation highly preferred 14% innovation preferred 19% innovation a success but not much better 11% innovation a disappointment but not much worse 28% standard preferred 6% standard highly preferred 11%

Establishing Association strong. (Attempts)

Establishing Association strong. Association consistent. (Attempts)

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses.

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect.

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible.

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations.

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations. This is hard to do reliably.

Establishing (Attempts) Association strong. Association consistent. Higher doses give stronger responses. Alleged cause precedes effect. Alleged cause is plausible. Rule out other plausible explanations. This is hard to do reliably. is much clearer!

Basic Strategies 1) Control extraneous sources of variation.

Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation.

Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation. 3) Replicate to increase accuracy and gain greater confidence in the scope of your conclusions.

Basic Strategies 1) Control extraneous sources of variation. 2) Randomize to deal with uncontrollable sources of variation. 3) Replicate to increase accuracy and gain greater confidence in the scope of your conclusions. 4) Block when possible to increase accuracy/sensitivity and better control variability.

Sampling Words Sample vs. Population Sample Statistic vs. Population Parameter Sampling Frame (not in your text?) Voluntary Response Sample (not in your text?) Convenience Sample Biased Sample Simple Random Sample (SRS)

Sampling Words Census Strata Stratified Random Sample Cluster Sample Multistage Sample Design

Sampling Words Matching in an observational study cohort Undercoverage (not in your text?) Non-Response Bias Response Bias (not in your text?) Leading Questions Sampling Variability

Stratification Strata groups of homogeneous individuals. Stratified Random Sample same probability of choice within each group.

Stratification Strata groups of homogeneous individuals. Stratified Random Sample same probability of choice within each group. Advantages include: Every stratum well represented. Can be more accurate for a given sample size. Strata with greater variability should be better represented.

Types of Bias Response bias vs. voluntary response bias vs. non-response bias?

Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Response Bias: problems in the questions or how they are asked.

Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Voluntary Response Bias: problems in surveys where only volunteers participate.

Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Non-Response Bias: problems associated with which people are missing in the final results.

Types of Bias Response bias vs. voluntary response bias vs. non-response bias? Undercoverage: groups somewhat missing from the sampling frame.

s Observational Study vs. Prospective vs Retrospective Study Factor in an experiment Level Treatment

s Control Group Single-Blind vs. Double-Blind One Factor vs. Two Factor Placebo Placebo Effect

s Block Block Design Matched Pairs Design Confounding Variables Statistically Significant Effect

Factors and Levels Factors vs. Levels vs. Treatments?

Factors and Levels Factors vs. Levels vs. Treatments? Factor in an : Variable being manipulated.

Factors and Levels Factors vs. Levels vs. Treatments? Levels: Values of a factor.

Factors and Levels Factors vs. Levels vs. Treatments? Treatment: What is actively done to the experimental units.

Block Related Block vs. Block Design vs. Matched Pairs Design

Block Related Block vs. Block Design vs. Matched Pairs Design Block: homogenous group similar in some important way.

Block Related Block vs. Block Design vs. Matched Pairs Design Block Design: random within each block.

Block Related Block vs. Block Design vs. Matched Pairs Design Matched Pairs Design: block size of 2.