Simple Linear Regression the model, estimation and testing

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Simple Linear Regression the model, estimation and testing"

Transcription

1 Simple Linear Regression the model, estimation and testing Lecture No. 05

2 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

3 Example 1 dependent variable random error (residual) intercept independent variable slope

4 Simple Linear Regression the model The goal of a regression analysis is to obtain predictions of one variable using the known values of another

5 Simple Linear Regression Three assumptions: The ε term is assumed to be random variable that: 1. Has a mean of 0 2. Is normally distributed 3. Has constant variance at every value of X (Homoscedastic)

6 Simple Linear Regression Three assumptions: For any given value of x, the y values are assumed to be normally distributed about the population regression line and to have the same standard deviation σ The regression line based on sample data is an estimate of this true line.

7 Example 1 Sample regression line

8 The Least-Squares Criterion The least-squares criterion requires that the sum of the squared deviations between y values in the scatter diagram and y values predicted by the equation be minimized. In symbolic terms:

9 Determining the Least-Squares Regression Line

10 Example 1

11 Example 1

12 Example 1 - Point Estimates Using the Regression Line If a job applicant were to score x = 15 on the manual dexterity test, we would predict this person would be capable of producing 64.2 units per hour on the assembly line.

13 Estimation of standard error To develop interval estimates for the dependent variable, we must first determine the standard error of estimate. This is a standard deviation describing the dispersion of data points above and below the regression line. The formula for the standard error of estimate is shown below and is very similar to that for determining a sample standard deviation s:

14 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

15 Example 1 Now calculate the standard error of estimate as

16 Confidence and prediction Interval for the mean of y given a specific x value Given a specific value of x, we can make two kinds of interval estimates regarding y: (1) a confidence interval for the (unknown) true mean of y, and (2) a prediction interval for an individual y observation.

17 Confidence interval for the mean of y given a specific x value

18 Example 1 Confidence Interval For persons scoring x = 15 on the dexterity test, what is the 95% confidence interval for their mean productivity? For the 95% level of confidence and df=n-2=3, t =3.182 and the 95% confidence interval can now be calculated as Based on these calculations, we have 95% confidence that the mean productivity for persons scoring x = 15 on the dexterity test will be between and units per hour.

19 Prediction Interval for an Individual y Observation For a given value of x, the estimation interval for an individual y observation is called the prediction interval. Prediction interval for an individual y, given a specific value of x: additional 1

20 Example 1 Prediction Interval A prospective employee has scored x = 15 on the dexterity test. What is the 95% prediction interval for his productivity? For this applicant, we have 95% confidence that his productivity as an employee would be between and units per hour.

21 Example 1 Prediction Interval The 95% prediction interval for individual y values becomes slightly wider whenever the interval is based on x values that are farther away from the mean of x.

22 Testing and Estimation for the Slope

23 Testing and Estimation for the Slope

24 Example 1 Testing and Estimation for the Slope An equivalent method of testing the significance of the linear relationship is to examine whether the slope β 1 of the population regression line could be zero. For the dexterity test data, the slope of the sample regression line was b 1 = Using the 0.05 level of significance, examine whether the slope of the population regression line could be zero. 2. Construct the 95% confidence interval for the slope of the population regression line.

25 Example 1 Testing and Estimation for the Slope

26 Example 1 Testing and Estimation for the Slope p value We reject the null hypothesis

27 Confidence interval for the Slope

28 Example 1 Testing and Estimation for the Slope 95% Confidence Interval for the Slope of the Population Regression Line

29 Example 2 50 randomly selected students took a math aptitude test before they began their statistics course. The Statistics Department has three questions. What linear regression equation best predicts statistics performance, based on math aptitude scores? If a student made an 80 on the aptitude test, what grade would we expect him to make in statistics? Make a confidence prediction interval for x=80 using 0.05 level of significance

30 Example 2 Solution in Excel

31 Example 2

32 Example 2

33 Example 2

34 Example 2 Solution in STATISTICA

35 Example

36 Example

37 Example 2

38 Example 2 another way to plot the graphs

39 Example 2 another way to plot the graphs

40 Example 2 another way to plot the graphs Regression bands Prediction intervals Confidence intervals

41 Example

42 Example 2 If a student made an 80 on the aptitude test, what grade would we expect him to make in statistics? Make a confidence prediction interval for x=80 using 0.05 level of significance.

43 Example 2