Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Similar documents
Chapter 1: Exploring Data

Unit 1 Exploring and Understanding Data

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Understandable Statistics

Chapter 1 - Sampling and Experimental Design

Introduction to Statistical Data Analysis I

Statistics Success Stories and Cautionary Tales

V. Gathering and Exploring Data

Undertaking statistical analysis of

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

AP Statistics. Semester One Review Part 1 Chapters 1-5

How to interpret scientific & statistical graphs

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Methodological skills

2.4.1 STA-O Assessment 2

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Probability and Statistics. Chapter 1

Lesson 9 Presentation and Display of Quantitative Data

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

CHAPTER 3 Describing Relationships

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

4.3 Measures of Variation

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Chapter 1: Explaining Behavior

Chapter 1. Picturing Distributions with Graphs

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Welcome to OSA Training Statistics Part II

Still important ideas

Lesson 1: Distributions and Their Shapes

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Business Statistics Probability

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Chapter 2--Norms and Basic Statistics for Testing

AP Stats Review for Midterm

Observational studies; descriptive statistics

9 research designs likely for PSYC 2100

Chapter 1 Where Do Data Come From?

STATISTICS & PROBABILITY

Still important ideas

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Unit 7 Comparisons and Relationships

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Test 1C AP Statistics Name:

A) I only B) II only C) III only D) II and III only E) I, II, and III

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

STAT243 LS: Intro to Probability and Statistics Quiz 1, Feb 10, 2017 KEY

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Measuring the User Experience

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Statistics is a broad mathematical discipline dealing with

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

INTRODUCTION TO STATISTICS

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Analysis and Interpretation of Data Part 1

Biostatistics for Med Students. Lecture 1

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Chapter 3: Examining Relationships

Section I: Multiple Choice Select the best answer for each question.

(a) 50% of the shows have a rating greater than: impossible to tell

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

STATISTICS AND RESEARCH DESIGN

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Frequency distributions

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

DOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS

Displaying the Order in a Group of Numbers Using Tables and Graphs

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

STA Module 9 Confidence Intervals for One Population Mean

bivariate analysis: The statistical analysis of the relationship between two variables.

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Instructions and Checklist

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box

Psychology Research Process

Math 2200 First Mid-Term Exam September 22, 2010

Quantitative Data and Measurement. POLI 205 Doing Research in Politics. Fall 2015

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Chapter 3: Describing Relationships

Stem-and-Leaf Displays. Example: Binge Drinking. Stem-and-Leaf Displays 1/29/2016. Section 3.2: Displaying Numerical Data: Stem-and-Leaf Displays

Transcription:

1 2 Outline Finish sampling slides from Tuesday. Study design what do you do with the subjects/units once you select them? (OI Sections 1.4-1.5) Observational studies vs. experiments Descriptive statistics and data visualization (OI Sections 1.6-1.7) Lab: Introduction to R and RStudio Practice Discuss with a partner: 1. A researchers is interested in the opinions of MSU students about updating gym equipment. A surveyor stands at the gym entrance door and uses the next 50 people who enter as a sample and asks each their opinion about updating gym equipment. What type of sample and bias? 2. There are 25 sections of Stat 216 offered at MSU this semester. How would you use cluster sampling to choose a sample from all Stat 216 students? Multistage sampling? 3. What are the two main differences between a stratified random sample and a cluster sample? 3 4 Observational Studies vs Experiments An observational study is a study which observes individuals and measures variables, but does not attempt to manipulate or influence the responses. In prospective observational studies, investigators choose a sample and collect new data generated from that sample they look forward in time. In retrospective observational studies, investigators look backwards in time and use data that have already been collected. : In a case-control study, the researchers select a sample of cases (e.g., lung-cancer patients) and a sample of controls (e.g., patients similar to the cases but without lung cancer) and ask them about past behavior (e.g., smoking). Retrospective or prospective? A study that follows marijuana users in Colorado for 5 years. A study of illegal immigrant activity last year in Arizona. Observational Studies vs Experiments An experiment is a study in which treatment(s) are deliberately imposed on individuals in order to observe their response. A randomized experiment is an experiment where treatments are randomly assigned to subjects. The treatments are levels of an explanatory variable. Don t confuse random assignment with random sampling! 5 6 Confounding Variables In an observational study, we cannot show cause-andeffect relationships because there is the possibility that the response is affected by some variable(s) other than the ones being measured a confounding variable is a variable that both: 1. is related to the explanatory variable, and 2. may have an effect on the response variable. Discuss What are the disadvantages and advantages of an observational study compared to an experiment? In a randomized experiment, the random assignment of levels of the explanatory variable should balance out (on average) any possible confounding variables, allowing us to examine cause-and-effect relationships.

7 8 Houndstongue (a noxious weed) is found in abundance on private and public lands that have been grazed by cattle. Houndstongue is rarely found on lands that have been grazed by mountain goats. One investigator concluded that houndstongue infestations could be reduced by importing mountain goats to the infested areas. 1. Variables and types? Explanatory and response? 2. Sampling bias? To what population can we generalize? 3. Observational study or experiment? Prospective or retrospective? 4. Confounding variables? A 1993 study by UCI researchers Rauscher, Shaw, and Ky published in Nature tested 36 college students performance on a set of three standard IQ spatial reasoning tasks, each proceeded by one of three conditions: (1) listening to 10 minutes of a Mozart sonata, (2) listening to 10 minutes of relaxation instruction, or (3) listening to 10 minutes of silence. All three conditions were tested on each student, in a random order (to compensate for possible practice efffect). Results showed that IQ scores were significantly higher after the Mozart condition than the other two. 1. Variables and types? Explanatory and response? 2. Sampling bias? To what population can we generalize? 3. Observational study or experiment? Prospective or retrospective? 4. Confounding variables? 9 10 Does Prayer Lower Blood Pressure? A study followed a random sample of 2391 people for 6 years, and concluded that, Attending religious services lowers blood pressure more than tuning into religious TV or radio. USA Today headline: Prayer can lower blood pressure 1. Variables and types? Explanatory and response? 2. Sampling bias? To what population can we generalize? 3. Observational study or experiment? Prospective or retrospective? 4. Confounding variables? Each of 22,071 male physicians between the ages of 40 and 84 was randomly assigned to one of two treatment groups: (1) one aspirin per day, (2) one placebo tablet per day. Results showed that the aspirin group had a lower incidence of heart attacks than the placebo group. 1. Variables and types? Explanatory and response? 2. Sampling bias? To what population can we generalize? 3. Observational study or experiment? Prospective or retrospective? 4. Confounding variables? 11 12 What can we conclude? How Study is Conducted: (Cause-and-effect) Randomized Experiment Observational Study How Sample is Collected: (Generalizability) Random Sample from the Population Causal relationship, and can extend results to population. Cannot conclude causal relationship, but can extend results to population. Non-random Sample from the Population Causal relationship, but cannot extend results to a population. Cannot conclude causal relationship, and cannot extend results to a population. Principles of Experimental Design 1. Controlling 2. Randomization (random assignment) 3. Replication 4. Blocking

13 14 Experimental Design: Control An extraneous factor is a variable that is not of primary interest and yet affects the response variable. An extraneous factor is only called a confounding variable if it also is related to the explanatory variable. In an experiment, researchers try to hold extraneous factors constant for all units so that the effects of the extraneous factor are not confounded with the factors of interest. Experimental Design: Control : If the treatment is a new drug administered through a pill, researchers will give the control group a placebo pill a pill that has no active treatment, yet looks/tastes the same as the new drug. : The subjects and researchers recording the response should be blind they do not know which treatment was received to avoid unconscious expectations. If both subjects and researchers are blind, we say the study is double-blind. If only one is blind, we say the study is single-blind. 15 16 Experimental Design: Randomization Levels of the explanatory variable (treatments) are randomly assigned to experimental units in order to create similar experimental groups. This balances out values of the extraneous factors. Experimental Design: Replication Within one experiment, we use replication by assigning many experimental units (large sample sizes) to each treatment group to reduce the role of random variation due to uncontrolled extrameous variables. Groups of scientists should replicate entire studies to verify earlier findings. 17 18 Experimental Design: Blocking Prior to random assignment, experimental units are classified into homogenous subgroups or blocks so that the extraneous factors are held constant within each block. Treatments are randomly assigned to units within each block. Block what you can, randomize what you cannot. George Box Discuss: What is the difference between blocking in an experiment and stratifying in choosing a sample? Two Basic Experimental Designs 1. Completely Randomized Design Experimental units are randomly assigned to each treatment (using experimental design principles 1-3). 2. Randomized Block Design Experimental units are classified into blocks that are similar with respect to extraneous variable(s), then units are randomly assigned to treatments independently within each block (using experimental design principles 1-4) A matched pairs design is a randomized block design where each block consists of a pair of experimental units.

19 20 Practice Statistical Investigation Process (Tintle et al., 2016) Suppose a researcher wants to know whether taking caffeine an hour before swimming affects the time it takes swimmers to complete a 1-mile swim and that 50 volunteers are available for the study. 1. What are the experimental units? 2. What is the response variable? the explanatory variable? types of these variables? 3. What should be the treatments? 4. What are some potential extraneous factors? 5. Describe how you could design a: a. Completely randomized design b. Randomized block design c. Matched-pairs design Descriptive statistics (summary statistics) and plots Sampling methods and experimental design 22 Summary Statistics SUMMARIZING DATA Descriptive Statistics and Data Visualization Type of variable One categorical Two categorical One quantitative Two quantitative Summary statistics frequency (count) or relative frequency (proportion) in each category contingency (two-way) table Center: mean, median, mode Variability: range, quartiles, inter-quartile range (IQR), variance, standard deviation Shape: skewness, kurtosis (we won t cover these) correlation coefficient (r), coefficient of determination (R 2 ) Plots Type of variable One categorical Two categorical Plot Bar plot (do not use pie charts! why?), mosaic plot Segmented bar plot (or side-by-side bar plot) 23 : Nightlights and Nearsightedness Survey of n = 479 children. Those who slept with nightlight or in fully lit room before age 2 had higher incidence of nearsightedness (myopia) later in childhood. 24 One quantitative Histogram, density plot (smoothed histogram), dotplot, boxplot, stem-and-leaf plot Two quantitative One quantitative and one categorical Two quantitative and one categorical Scatterplot Side-by-side boxplots Scatterplot with different colors/shapes for categories What are the observational units? Note that now we are measuring two categorical variables. What are they? Is there an association between the two variables? Why? How could we construct a bar graph for these data? How could we detect association between the two variables in the bar graph?

: Nightlights and Nearsightedness Response: Degree of Myopia Explanatory: Amount of Sleeptime Lighting You could create a segmented bar plot of these data by stacking the three colored bars within each lighting condition. 25 Bar Graphs: Important Notes When creating a bar graph displaying two categorical variables: Categories of explanatory variable à x-axis Categories of response variable à differing colors/shadings; include in legend. y-axis reports row percentages (relative frequencies) percent (not frequency) in each response category within each explanatory category Heights of bars in each explanatory variable category should add to 100%. Always include x-axis and y-axis labels! 26 27 What to look for in the distribution of one quantitative variable? 1. Center where is the distribution centered at? 2. Variability (spread) how spread out are the values of the variable? 3. Shape what shape is the distribution? (e.g., symmetric, right/positive skewed, left/negative skewed, unimodal, bimodal, multimodal) 4. Outliers are there any observations that do not fit the overall pattern of the distribution? Frequency 0 10 20 30 40 Histogram of 272 Eruption Times for Old Faithful Geyser 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Length of Eruption (min) Center Notation for raw data: You have n observations of a quantitative variable, denoted by x, x, 1 2!, x n n is called the sample size. Mean = arithmetic average = x-bar x = x 1 + x 2 + + x n n x i = n The balancing point; the value such that the sum of all the deviations from the mean is zero: n (x i x) = 0. i=1 29 Center (cont) Median = middle value = M Arrange the values in increasing order. If n is odd, the median M is located in position (n+1)/2. If n is even, the median M is the average of the middle two values, at positions (n/2) and (n/2)+1. Mode = most frequent value The mode need not be unique. A distribution is called unimodal if there is a single prominent peak; bimodal if there are two prominent peaks. 30

Shape 31 Variability Range = highest value (max) lowest value (min) Interquartile range (IQR) = Q 3 Q 1 where Q 3 = upper quartile (75 th percentile) = median of upper ½ Q 1 = lower quartile (25 th percentile) = median of lower ½ 32 (a) Skewed left (negative) (b) Normal distribution (c) Skewed right (example of a symmetric (positive) distribution) Note: The direction of skewness is the direction in which the tail is pulling the distribution. Median Variability (cont) Standard deviation measures variability by summarizing how far individual data values generally are from the mean. It s most useful for bell-shaped data. Interpret the standard deviation as roughly the average distance values fall from the mean. 33 Calculating the Sample Standard Deviation Formula for the (sample) standard deviation: s = (x 1 x) 2 + + (x n x) 2 n 1 An equivalent formula, easier to compute, is: s = 2 ( x i ) nx 2 n 1 The value of s 2 is called the (sample) variance. = ( x i x) 2 n 1 34 Data: 90, 90, 100, 110, 110 (n = 5 observations) (90+90+100+110+110)/5 = 100 x = Observation Deviation Squared Deviation x i x i x x i x 90-10 100 90-10 100 100 0 0 110 10 100 110 10 100 Sum = 500 Sum = 0 Sum = 400 ( ) 2 s 2 = 400 35 Sample Variance: 5 1 =100 Sample Std Dev: s = 100 =10 The following dotplots represent ratings of statistics from three different classes. Which class has the largest variability in their ratings? Why? (No calculations are necessary.) A. B. C. 36

The following dotplots represent ratings of statistics from two different classes. Which class would you say has more variability in their ratings? (No calculations are necessary.) 37 Outliers 38 A. B. An outlier is a data value that doesn t fit the pattern of the majority of the data. Rule of thumb: An observation is considered an outlier if it is either greater than Q 3 + 1.5 IQR, or less than Q 1 1.5 IQR. Influence of Outliers and Shape on the Mean and Median Outliers have a larger influence on the mean than on the median. Why? : 1, 2, 3, 4, 5 Mean = 3 Median = 3 1, 2, 3, 4, 5, 100 Mean = 19.2 Median = 3.5 Symmetric à Mean = Median Skewed à Mean pulled in the direction of skew: Skewed left: Mean < Median Skewed right: Mean > Median The median is a robust estimate (resistant to outliers); the mean is not a robust estimate. 39 Boxplots Picture by John Landers: http://www.causeweb.org 40 Five-Number Summary 1. Minimum (smallest value) 2. Q 1 = Lower Quartile = 25 th Percentile = median of lower half of the ordered data values (not including median) 3. Median = middle value = 50 th Percentile 4. Q 3 = Upper Quartile = 75 th Percentile = median of lower half of the ordered data values (not including median) 5. Maximum (largest value) 41 : Fastest Speeds Ever Driven Ordered data (in rows of 10 values) for n = 87 males: 55 60 80 80 80 80 85 85 85 85 90 90 90 90 90 92 94 95 95 95 95 95 95 100 100 100 100 100 100 100 100 100 101 102 105 105 105 105 105 105 105 105 109 110 110 110 110 110 110 110 110 110 110 110 110 112 115 115 115 115 115 115 120 120 120 120 120 120 120 120 120 120 124 125 125 125 125 125 125 130 130 140 140 140 140 145 150 Asked Penn State statistics students, what is the fastest speed you have every driven? (mph) Find the five-number summary for these data. 42

How to Draw a Boxplot and Identify Outliers 1. Draw horizontal (or vertical) axis with equally-spaced values from lowest to highest in data. Make sure you label your axis with the variable name and units of measurement (e.g., Fastest speed driven (mph) ). 2. Draw rectangle (box) with ends at quartiles. 3. Draw line in box at value of median. 4. Compute 1.5 x IQR. Any value more than this distance from closest quartile is considered an outlier. 5. Draw line (whisker) from each end of box extending to farthest data value that is not an outlier. If no outlier, then whiskers extend to min or max. 6. Draw asterisks/dots to indicate the outliers. Draw boxplot for fastest speeds. 43 Boxplot for Fastest Speeds 1. Draw horizontal line from 55 to 150 and label it. 2. Draw rectangle with ends at 95 and 120. 3. Draw line in box at median of 110. 4. Compute IQR = 120 95 = 25. 5. Compute 1.5(IQR) = 1.5(25) = 37.5; outlier is any value below 95 37.5 = 57.5 or above 120 + 37.5 = 157.5. 6. Draw line from each end of box extending down to 60 (smallest data value not an outlier) and up to 150. 7. Draw asterisk/dot at outlier of 55 mph. Always label your axes! 60 80 100 120 140 Fastest Speed Driven (mph) 44 Comparing two groups Fastest Speed Driven (mph) 40 60 80 100 120 140 45 What to look for in the relationship between two quantitative variables? 1. Form what is the form of the relationship between the two variables? (e.g., linear, quadratic, piece-wise linear) 2. Strength how strong is the relationship? (e.g., how close do the points follow the form?) 3. Direction what is the direction of the relationship? Positive association = as one variable increases, the other tends to increase Negative association = as one variable increases, the other tends to decrease 4. Outliers are there any observations that do not fit the overall pattern of the distribution? 46 Female Male http://www.npr.org/2017/08/18/544265493/chart-the-relationship-between-seeing-discrimination-and-voting-fortrump