STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Similar documents
Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Math 124: Module 2, Part II

3.2A Least-Squares Regression

STAT 201 Chapter 3. Association and Regression

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Statistics Success Stories and Cautionary Tales

3.2 Least- Squares Regression

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

8.SP.1 Hand span and height

Chapter 3 CORRELATION AND REGRESSION

Chapter 3: Describing Relationships

Eating and Sleeping Habits of Different Countries

Chapter 3: Examining Relationships

AP Statistics Practice Test Ch. 3 and Previous

CHAPTER 3 Describing Relationships

A response variable is a variable that. An explanatory variable is a variable that.

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

Section 3.2 Least-Squares Regression

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

Math 075 Activities and Worksheets Book 2:

Chapter 1: Exploring Data

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Business Statistics Probability

MAT Mathematics in Today's World

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

REVIEW PROBLEMS FOR FIRST EXAM

Unit 1 Exploring and Understanding Data

Beware of Confounding Variables

The Practice of Statistics 1 Week 2: Relationships and Data Collection

Scientific Research. The Scientific Method. Scientific Explanation

Creative Commons Attribution-NonCommercial-Share Alike License

Chapter 13 Summary Experiments and Observational Studies

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

Lecture 12 Cautions in Analyzing Associations

Introduction to regression

14.1: Inference about the Model

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

AP Stats Chap 27 Inferences for Regression

7) Briefly explain why a large value of r 2 is desirable in a regression setting.

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Module 2/3 Research Strategies: How Psychologists Ask and Answer Questions

Still important ideas

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Lecture 4. Experiments and Observational Studies

Chapter 7: Descriptive Statistics

Chapter 4: Scatterplots and Correlation

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Administrative Information

Ordinary Least Squares Regression

Regulation of Human Heart Rate

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

STAT 135 Introduction to Statistics via Modeling: Midterm II Thursday November 16th, Name:

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

IAPT: Regression. Regression analyses

Examining Relationships Least-squares regression. Sections 2.3

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Lesson 1: Distributions and Their Shapes

MATH 2560 C F03 Elementary Statistics I LECTURE 6: Scatterplots (Continuation).

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CHAPTER ONE CORRELATION

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

STATISTICS INFORMED DECISIONS USING DATA

STAT 111 SEC 006 PRACTICE EXAM 1: SPRING 2007

Chapter 14. Inference for Regression Inference about the Model 14.1 Testing the Relationship Signi!cance Test Practice

CHILD HEALTH AND DEVELOPMENT STUDY

5 To Invest or not to Invest? That is the Question.

CHAPTER 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

SPSS Correlation/Regression

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

STATISTICS & PROBABILITY

Correlated to: ACT College Readiness Standards Science (High School)

Understandable Statistics

Ch. 1 Collecting and Displaying Data

1.4 - Linear Regression and MS Excel

Sheila Barron Statistics Outreach Center 2/8/2011

(a) 50% of the shows have a rating greater than: impossible to tell

Regression. Lelys Bravo de Guenni. April 24th, 2015

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Review. Chapter 5. Common Language. Ch 3: samples. Ch 4: real world sample surveys. Experiments, Good and Bad

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

AS 3.12, Externally assessed, 4 credits Experiments and Observational Studies

Class 7 Everything is Related

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

F1: Introduction to Econometrics

DATA TO INSIGHT: AN INTRODUCTION TO DATA ANALYSIS THE UNIVERSITY OF AUCKLAND

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation

(a) 50% of the shows have a rating greater than: impossible to tell

Lecture 9A Section 2.7. Wed, Sep 10, 2008

Pitfalls in Linear Regression Analysis

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

7. Bivariate Graphing

Unit 3 Lesson 2 Investigation 4

Chapter 3. Producing Data

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Transcription:

STATISTICS 201 Survey: Provide this Info Outline for today: Go over syllabus Provide requested information on survey (handed out in class) Brief introduction and hands-on activity Name Major/Program Year in school or in graduate program Something interesting about yourself Why you are taking this class How familiar are you with these? Survey, continued (Not at all, somewhat, very, could teach it) 1. Summation notation 2. Hypothesis testing in general 3. P-values 4. Confidence intervals 5. Two-sample t-test 6. Sampling distributions 7. F-Distribution 8. Scatter plots 9. Simple linear regression 10.Matrices Provide the following data: a. Your height, in inches (to nearest half inch) or centimters (note 1 cm = 0.3937 inches) b. Your handspan in centimeters, defined as the distance covered on the ruler by your stretched hand from the tip of the thumb to the tip of the small finger. c. Predicted handspan = 3 + 0.35 (ht in inches) or 3 + 0.1378 (ht in centimeters) d. Your residual (to be explained!) Regression and ANOVA IMPORTANT NOTE Used to describe the relationship between a continuous response variable and one or more predictor variables (continuous = regression; categorical = ANOVA). Regression used to predict a future response using known, current values of the predictors, or estimate relationship. ANOVA used to figure out why means differ for different groups, treatments, etc. First need to discuss how data collection method affects potential conclusions very important! Switch to power point slides modified from Brooks/Cole to accompany Mind On Statistics by Utts/Heckard The remaining slides are modified from Power point presentations to accompany Mind on Statistics, by Utts and Heckard and are copyright Brooks/Cole. They are not to be copied or used for purposes other than this class. 1

Principle Idea: Gathering Useful Data See Section 1.4 of our 201 textbook The knowledge of how the data were generated is one of the key ingredients for translating data intelligently. Description or Decision? Using Data Wisely Descriptive Statistics: using numerical and graphical summaries to characterize a data set (and only that data set). Inferential Statistics: using sample information to make conclusions about a broader range of individuals than just those observed. Two Important Issues Based on Data Collection Method Extending results to a population: This can be done if the data are representative of a larger population for the question of interest. Safest to use a random sample. Cause and effect conclusion: Can only be made if data are from a randomized experiment, not from an observational study. 10 Definitions of Types of Studies Observational Study: Researchers observe or question participants about opinions, behaviors, or outcomes. Participants not asked to do anything differently. Two special cases: sample surveys and case-control studies. Experiment: Researchers manipulate something and measure the effect of the manipulation on some outcome of interest. Randomized experiments: participants are randomly assigned to participate in one condition (called treatment) or another. Sometimes cannot conduct experiment due to practical/ethical issues. NOT the same thing as random sampling. 2

Types of Variables (Measured or Not) Explanatory variable (or independent variable) is one that may explain or may cause differences in a response variable (or outcome or dependent variable). A confounding variable is a variable that affects the response variable and also is related to the explanatory variable.a potential confounding variable not measured in the study is called a lurking variable. Obs. Study: Lead Exposure and Bad Teeth Children exposed to lead are more likely to suffer tooth decay USA Today Observational study involving 24,901 children. Explanatory variable = level of lead exposure. Response variable = extent child has missing/decayed teeth. Possible confounding variables = income level, diet, time since last dental visit. Lurking variables = amount of fluoride in water, health care CRUCIAL POINT This study is an observational study. We cannot conclude that lead exposure causes tooth decay. It would be unethical to do a randomized experiment, so we need other (nonstatistical) ways to establish cause and effect. Randomized Experiment: Quitting Smoking with Nicotine Patches After the eight-week period of patch use, almost half (46%) of the nicotine group had quit smoking, while only one-fifth (20%) of the placebo group had. Newsweek, March 9, 1993, p. 62 Double-blind, Placebo-controlled Randomized Experiment 240 smokers recruited (volunteers) Randomized to 22-mg nicotine patch or placebo (controlled) patch for 8 weeks. Double-blind: neither the participants nor the nurses taking the measurements knew who had received the active nicotine patches. CRUCIAL POINT This study is a randomized experiment. We can conclude that nicotine patches cause people to quit smoking. Potential confounding variables should be similar in the placebo and nicotine patch groups because of random assignment. Relationships Between Quantitative Variables 3

Three Tools we will use Scatterplot, a two-dimensional graph of data values Correlation, a statistic that measures the strength and direction of a linear relationship Regression equation, an equation that describes the average relationship between a response and explanatory variable Looking for Patterns with Scatterplots Questions to Ask about a Scatterplot What is the average pattern? Does it look like a straight line or is it curved? What is the direction of the pattern? How much do individual points vary from the average pattern? Are there any unusual data points? Positive/Negative Association Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase. Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase. Example: Height and Handspan Data: Height (in.) Span (cm) 71 23.5 69 22.0 66 18.5 64 20.5 71 21.0 72 24.0 67 19.5 65 20.5 76 24.5 67 20.0 70 23.0 62 17.0 and so on, for n = 167 observations. Data shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students. Example, cont. Height and Handspan Taller people tend to have greater handspan measurements than shorter people do. When two variables tend to increase together, we say that they have a positive association. The handspan and height measurements may have a linear relationship. Example: Driver Age and Maximum Legibility Distance of Highway Signs A research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign. The 30 participants in the study ranged in age from 18 to 82 years old. We want to examine the relationship between age and the sign legibility distance. 4

Example Driver Age and Maximum Legibility Distance of Highway Signs Example: The Development of Musical Preferences We see a negative association with a linear pattern. We will use a straight-line equation to model this relationship. The 108 participants in the study ranged in age from 16 to 86 years old. We want to examine the relationship between song-specific age (age in the year the song was popular) and musical preference (positive score => above average, negative score => below average). Note that a negative song-specific age means the person wasn t born yet when the song was popular. Example: The Development of Musical Preferences Popular music preferences acquired in late adolescence and early adulthood. The association is nonlinear. Describing Linear Patterns with a Regression Line When the best equation for describing the relationship between x and y is a straight line, the equation is called the regression line. Two purposes of the regression line: to estimate the average value of y at any specified value of x to predict the value of y for an individual, given that individual s x value Example: Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Estimate the average handspan for people 60 inches tall: Average handspan = -3 + 0.35(60) = 18 cm. Predict the handspan for someone who is 60 inches tall: Predicted handspan = -3 + 0.35(60) = 18 cm. Example: Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Slope = 0.35 => Handspan increases by 0.35 cm, on average, for each increase of 1 inch in height. In a statistical relationship, there is variation from the average pattern. 5

The Equation for the Regression Line (for a sample, not a population) ŷ yˆ b0 b1 x is spoken as y-hat, and it is also referred to either as predicted y or estimated y. b 0 is the intercept of the straight line. The intercept is the value of y when x = 0. b 1 is the slope of the straight line. The slope tells us how much of an increase (or decrease) there is for the y variable when the x variable increases by one unit. The sign of the slope tells us whether y increases or decreases when x increases. Prediction Errors and Residuals Prediction Error = difference between the observed value of y and the predicted value ŷ. Residual = y yˆ Let s predict your handspan Record these on your survey ˆ b b x Regression equation: y 0 1 Handspan (cm) = -3 + 0.35 Height (inches) Or -3 + 0.1378 Height (cm) Calculate your predicted handspan: Examples: -3 + (0.35)(70 inches) = 21.5 cm -3 + (0.35)(65 inches) = 19.75 cm -3 + (0.1378) (165 cm) = 19.73 cm Find your residual: (actual handspan predicted handspan) 33 Measuring Strength and Direction with Correlation Correlation r indicates the strength and the direction of a straight-line relationship. The strength of the relationship is determined by the closeness of the points to a straight line. The direction is determined by whether one variable generally increases or generally decreases when the other variable increases. Example: Height and Handspan (cont) Regression equation: Handspan = -3 + 0.35 Height Correlation r = +0.74 => a somewhat strong positive linear relationship. Example: Driver Age and Maximum Legibility Distance of Highway Signs (cont) Regression equation: Distance = 577-3 Age Correlation r = -0.8 => a somewhat strong negative linear association. 6

Example: Left and Right Handspans If you know the span of a person s right hand, can you accurately predict his/her left handspan? Correlation r = +0.95 => a very strong positive linear relationship. Example: Verbal SAT and GPA Grade point averages (GPAs) and verbal SAT scores for a sample of 100 university students. Correlation r = 0.485 => a moderately strong positive linear relationship. Example: Age and Hours of TV Viewing Relationship between age and hours of daily television viewing for 1913 survey respondents. Correlation r = 0.12 => a weak connection. Note: a few claimed to watch more than 20 hours/day! Example: Hours of Sleep and Hours of Study Relationship between reported hours of sleep the previous 24 hours and the reported hours of study during the same period for a sample of 116 college students. Correlation r = 0.36 => a not too strong negative association. (More study, less sleep) Summary Regression is used to do two things: Predict future values using information available now. (Predict response from explanatory variable.) Estimate the average relationship between a response and one or more explanatory variables. Regression only works for linear relationships. Homework Problems 1.12, 1.13, 1.29 and 1.30 Due next Wed (October 9) by 6pm 41 7