Examining Relationships Least-squares regression. Sections 2.3

Similar documents
Chapter 1: Exploring Data

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Unit 1 Exploring and Understanding Data

STATISTICS & PROBABILITY

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

Understandable Statistics

STAT 201 Chapter 3. Association and Regression

Chapter 3. Producing Data

Chapter 3: Describing Relationships

AP Statistics. Semester One Review Part 1 Chapters 1-5

Sampling. (James Madison University) January 9, / 13

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

Chapter 3. Producing Data

Vocabulary. Bias. Blinding. Block. Cluster sample

Section 3.2 Least-Squares Regression

STATISTICS INFORMED DECISIONS USING DATA

Chapter 5: Producing Data

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Psych 1Chapter 2 Overview

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

3.2A Least-Squares Regression

3.2 Least- Squares Regression

Lecture 12 Cautions in Analyzing Associations

Chapter 13 Summary Experiments and Observational Studies

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Chapter 3 CORRELATION AND REGRESSION

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Designed Experiments have developed their own terminology. The individuals in an experiment are often called subjects.

Statistics and Probability

Dr. Allen Back. Sep. 30, 2016

CHAPTER 5: PRODUCING DATA

STATS Relationships between variables: Correlation

bivariate analysis: The statistical analysis of the relationship between two variables.

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

Review+Practice. May 30, 2012

Dr. Allen Back. Oct. 7, 2016

Chapter 4: More about Relationships between Two-Variables Review Sheet

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Introduction to regression

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

1.4 - Linear Regression and MS Excel

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

CHAPTER 9: Producing Data: Experiments

AP Statistics Practice Test Ch. 3 and Previous

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

6. Unusual and Influential Data

F1: Introduction to Econometrics

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

Chapter 3: Examining Relationships

CHAPTER 4 Designing Studies

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

Still important ideas

Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Chapter 1 - Sampling and Experimental Design

The degree to which a measure is free from error. (See page 65) Accuracy

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

WELCOME! Lecture 11 Thommy Perlinger

General Biostatistics Concepts

Villarreal Rm. 170 Handout (4.3)/(4.4) - 1 Designing Experiments I

AP Psychology -- Chapter 02 Review Research Methods in Psychology

3.4 What are some cautions in analyzing association?

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Outline. Chapter 3: Random Sampling, Probability, and the Binomial Distribution. Some Data: The Value of Statistical Consulting

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

appstats26.notebook April 17, 2015

NORTH SOUTH UNIVERSITY TUTORIAL 2

Critical Appraisal Series

Chapter 5: Field experimental designs in agriculture

Chapter 1: Explaining Behavior

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Math 140 Introductory Statistics

AP Stats Review for Midterm

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

MULTIPLE REGRESSION OF CPS DATA

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Chapter 4: Scatterplots and Correlation

1 Version SP.A Investigate patterns of association in bivariate data

Research Designs. Inferential Statistics. Two Samples from Two Distinct Populations. Sampling Error (Figure 11.2) Sampling Error

Statistics for Psychology

Design of Experiments & Introduction to Research

Methodological skills

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Research Methods. It is actually way more exciting than it sounds!!!!

A) I only B) II only C) III only D) II and III only E) I, II, and III

CHAPTER 3 RESEARCH METHODOLOGY

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

Chapter 1. Understanding Social Behavior

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

AP Stats Chap 27 Inferences for Regression

Transcription:

Examining Relationships Least-squares regression Sections 2.3

The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability in a response variable, y. Often one wants to predict y from a given x.

The least-squares regression line The least-squares regression line is: with slope and intercept standard deviation in y standard deviation in x correlation of x and y A prediction, ŷ, is made by plugging in a value of x

Interpretations The least-squares line minimizes the sum of squared -prediction errors. Note: vertical prediction errors: interchanging x and y would modify the formulation.

Interpretations Slope, b 1, is amount of change in ŷ when x increases by one unit. Intercept, b 0, is the prediction, ŷ, at x = 0. Example: BAC data. Each beer increases predicted BAC by 0.0180. Predicted BAC after no beers is -0.0127 0. random variability

Coefficient of determination, r 2 The coefficient of determination is r 2, measures the proportion of variability explained by the regression line. r 2 = 0.76 var. in y-hat var. in y

Residuals Analysis of residuals, y ŷ, helps to assess the suitability of a linear relationship.

Residual plots The ideal plot of residuals (y ŷ against x) would exhibit no systematic pattern.

Problem indicators Systematic patterns suggest complications and possible invalidity in the use of linear regression. Curved pattern: deviations from the linear form. Trends in spread: less prediction accuracy in some regions of x.

Influential observations An influential observation is an observation whose deletion would drastically change the regression line. 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 complete data point #3 point #3 deleted 0.00 0 1 2 3 4 5 6 7 8 9 10 Often an outlier in x, but may not be an outlier in y

Examining Relationships Cautions about correlation and regression Section 2.4

Basic cautions Correlation is for two-way relationships; regression is for one-way relationships. Both are only relevant for linear relationships. Neither is resistant.

Extrapolation Extrapolation is when predictions are made outside the range of data. The linear relationship may be untrustworthy outside the range of data. Example: BAC data. unconsciousness or death Predicted BAC after x = 24 beers: ŷ = 0.4184. Predicted BAC after x = 36 beers: ŷ = 0.6340.

Lurking variables A lurking variable may influence the relationship between variables. An unobserved lurking variable may explain puzzling associations. Example: Higher rates of red-wine drinking Better levels of overall health Possible lurking variables: income, other lifestyle tendencies, etc.

Association is not causation An observed association may reflect the influence of a causal lurking variable. An experiment that controls lurking variables is best for establishing causation. Example: BAC: control weight, gender, etc. Causation may be established in other ways, but with weaker evidence.

Examining Relationships Relationships in categorical data Section 2.5

Two-way tables Relationships in categorical data may be explored by compiling variables in two-way tables. Row variable Column variable (cnt./1000) Age group Educa-on 25 34 35 54 55+ < High school 4459 9174 14226 High school 11562 26455 20060 Collge 1 3 yrs. 10693 22647 11125 College 4+ yrs. 11071 23160 10597

Marginal distributions The marginal distributions are the individual distributions of the row and column variables. (They appear in the margins of the two-way table.) (cnt./1000) Age group Row Educa-on 25 34 35 54 55+ totals < High school 4459 9174 14226 27859 High school 11562 26455 20060 58077 Collge 1 3 yrs. 10693 22647 11125 44465 College 4+ yrs. 11071 23160 10597 44828 Column totals 37785 81436 56008 175229

Conditional distributions A conditional distribution is calculated from the counts of one variable limited to a given category of the other variable. (cnt./1000) Age group Row Educa-on 25 34 35 54 55+ totals < High school 4459 9174 14226 27859 High school 11562 26455 20060 58077 Collge 1 3 yrs. 10693 22647 11125 44465 College 4+ yrs. 11071 23160 10597 44828 Column totals 37785 81436 56008 175229

Visualizing relationships Describe relationships with conditional distributions. (cnt./1000) Age group Row Educa-on 25 34 35 54 55+ totals < High school 16% 33% 51% 100% High school 20% 45% 35% 100% Collge 1 3 yrs. 24% 51% 25% 100% College 4+ yrs. 25% 52% 24% 100% < High school High school College, 1-3 yrs. College, 4+ yrs. 60% 50% 60% 60% 50% 40% 30% 20% 10% 40% 30% 20% 10% 50% 40% 30% 20% 10% 50% 40% 30% 20% 10% 0% 25-34 35-54 55+ 0% 25-34 35-54 55+ 0% 25-34 35-54 55+ 0% 25-34 35-54 55+

Producing data Introduction Chapter 3

Observational studies and experiments Central issue: the (undesirable) possibility of confounding between an explanatory variable and a lurking variable. In an observational study, individuals are observed, but no attempt is made to control the conditions of data-production. Often plagued by confounding with lurking variables In an experiment, the conditions of data -production are controlled by applying treatments to individuals. Avoids all types of confounding

Producing data Designing samples Section 3.1

Key elements of a sampling study Population: a collection of individuals about which the conclusions of statistical inference are to be relevant. Sample: the subset of a population on which data are measured and put to analysis. Sampling design: the method used to select the sample from the population.

Biased sampling designs Biased sampling: favors some portions of the population over others. Examples: Voluntary sampling: individuals are self-selected by responding to an incentive. Convenience sampling: selection is determined by the convenience of the selection-maker.

Probability sampling Probability sampling uses chance to select a sample, based on known selection probabilities. Draw labels from a hat, computer simulation, table of random numbers. Any bias is accommodated using knowledge of the selection probabilities. Examples: Simple random sampling: each fixed-sized subset has the same probability of selection. Unbiased Stratified sampling: simple random samples are drawn in distinct strata and aggregated.

Other sources of bias Under-coverage in the population list. Non-response of sampled individuals. Inaccurate responses of the respondent (response bias). May be unintentionally encouraged by the interviewer. Poor questionnaire design and wording.

Producing data Designing experiments Section 3.2

Terminology General Individuals Explanatory variables Experiments Subjects Factors Value of explanatory variable Level of a factor The level of a factor reflects the application of a treatment used to modify the experimental conditions in a specific way.

Principles of experimental design Use comparisons to cancel the effects of lurking variables. A control group (i.e., sham treatment, or placebo) may serve as a baseline comparison. Use randomization to allocate subjects among treatments. Use replication to reduce random variability. Patterns in the response are statistically significant if they are of such magnitude that they would rarely be observed by chance.

Problem issues Several issues may potentially undermine the principles of experimental design: Unconscious bias of the experimenter or subject. Avoid by double-blind application of treatments. Lack of realism in the subjects, treatments, or the experimental setting.