Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

Chapter 8 Estimating with Confidence Lesson 2: Estimating a Population Proportion

What proportion of the beads are yellow? In your groups, you will find a 95% confidence interval for the true proportion of yellow beads. *remember: you will be using the data from the sample you have, and since everyone has a different sample statistic, we have to apply the ideas of sampling distributions what are those again?

What are the big ideas about sampling distributions for proportions? SOCS: Shape: (skip O) Center: Spread:

So what proportion of beads are yellow? 1. Each group will take a simple random sample of beads. Separate the beads into two groups, those that are yellow, and those that are not. 2. Determine the point estimate pƹ for the unknown population proportion p of red beads in the container. 3. In your groups, find a 95% confidence interval for the true proportion of red beads p. Hint: does the number 95% sound familiar? 4. Compare the results with the other groups.

Conditions for Estimating p These are the conditions you are expected to check before calculating a confidence interval 1. Random: The data come from a welldesigned random sample or randomized experiment. 2. Large counts: Both npƹ and n(1 p) Ƹ are at least 10. 1. What does this ensure?

Example 1: Check that the conditions for constructing a confidence interval for p are met. a. Glenn wonders what proportion of the students at his school believe that tuition is too high. He interviews an SRS of 50 of the 2400 students at his college. Thirty-eight of those interviewed think tuition is too high. Random? Yes, SRS. Large Counts? Yes: both n-phat and n(1-phat) are at least 10.

Example 1: Check that the conditions for constructing a confidence interval for p are met. b. The small round holes you often see in sea shells were drilled by other sea creatures, who ate the former dwellers of the shells. Whelks often drill into mussels, but this behavior appears to be more or less common in different locations. Researchers collected whelk eggs from the coast of Oregon, raised the whelks in the laboratory, then put each whelk in a container with some delicious mussels. Only 9 of 98 whelks drilled into a mussel. The researchers want to estimate the proportion p of Oregon whelks that will spontaneously drill into mussels. Random? Maybe? We do not now if the eggs were a random sample. Large Counts? No: n*phat is less than10.

What happens if one of the conditions is violated? If the data isn t a SRS or results from a randomized experiment, then there s no point in inference, as this violation limits our ability to form any conclusions about the population. When the Large Counts condition is violated, the capture rate will be lower than what is indicated in the confidence level.

Ƹ General Formula for constructing a confidence interval for an unknown population proportion, p Statistic ± (critical value) (standard deviation of statistic) The sample proportion, p, is the statistic we use to estimate p. Z* value that marks off the confidence interval. 90%: z*=1.645 95%: z* = 1.96 99%: z*= 2.575 Also known here as the standard error (SE) of p: an estimate of the standard deviation of the sampling distribution of p. Formula: SE p = p(1 p) n The standard error of p estimates how much p typically varies from p

How do I find the critical values? 1. Draw a picture of a normal distribution. Label the area within the confidence interval. Identify the area in the individual tails. 2. From your diagram, identify the amount of area to the left of the positive z-value. 3. Use your table: Find the z that corresponds to this amount of total area (to the left of the positive z)

Example 2: Find the critical value z* for an 80% confidence interval. Assume that the Large counts condition is met. z = 1.28

Example 3 Find the critical value z* for a 96% confidence interval. Assume that the Large counts condition is met. z = 2.05

How to calculate a confidence interval for p When the random and large counts conditions are met, a C% confidence interval for the population proportion p is Point estimate ±margin of error p Ƹ ± z p(1 Ƹ p) Ƹ n Where z is the critical value for the standard normal curve with C% of its area between z and +z. Your interval will look like: < p <

Example 4 According to a recent study by the Annenberg Foundation, only 36% of adults in the United States could name all three branches of government. This was based on a survey given to a random sample of 1416 adults. 1. Show that the conditions for calculating a confidence interval for a proportion are satisfied. 2. Calculate a 99% confidence interval for the proportion of all U.S. adults who could name all three branches of government. 3. Interpret the interval from Question 2.

Confidence Intervals in a 4 Step Process: Statistics Problems Demand Consistency 1. State: What parameter do you want to estimate, and at what confidence level? 2. Plan: Identify the appropriate inference method: check conditions. 3. Do: If the conditions are met, perform calculations. 4. Conclude: Interpret your interval in the context of the problem.

Example 4 In her first-grade social studies class, Jordan learned that 70% of Earth s surface was covered in water. She wondered if this was really true and asked her dad for help. To investigate, he tossed an inflatable globe to her 50 times, being careful to spin the globe each time. When she caught it, he recorded where her right finger was pointing. In 50 tosses, her finger was pointing to water 33 times. Construct and interpret a 95% confidence interval for the proportion of Earth s surface that is covered in water.

Example 4 Solutions State: We want to estimate p = the true proportion of Earth s surface that is covered in water with 95% confidence. Plan: Use a on-sample z* interval for p if the conditions are met Random? Yes 10%? Don t need to check: there was replacement. Large counts? Both n*p-hat and n*(1-p-hat) are greater than 10. Do: 0.529 p 0.791 Conclude: We are 95% confident that the interval from 0.529 to 0.791 captures the true proportion of Earth s surface that is covered in water. This is consistent with the claim that 70% of Earth s surface is covered in water, because 0.70 is one of the plausible values in the interval.

Finally: Sample size: what determines how big a sample size to use? The size of your margin of error (ME) determines the minimum sample size you ll use. The ME involves the sample proportion of successes, p-hat. Use a guess from a p-hat based on a past experience or study. Use p-hat = 0.5 as the guess. The ME is largest at this value, providing a conservative estimate.

Sample size for desired margin of error To determine the sample size n that will yield a C% confidence interval for a population proportion p with a maximum margin of error ME, solve the following inequality for n: Where p-hat is a guessed value for the sample proportion. The margin of error will always be less than or equal to ME if you use a p-hat value of 0.5.

Example 5 Suppose that you want to estimate p, the true proportion of students at your school who have a tattoo with 95% confidence and a margin of error of no more than 0.10. How large a sample is needed?

Example 5 Solution Identify variables: P-hat: we don t know, so 0.5 Z* : 1.96 ME: 0.10 Solve for n using your algebra skills. Sentence: We need to survey at least 97 students to estimate the true proportion of students with a tattoo with 95% confidence and a margin of error of at most 0.10.

What is my mystery μ? What do we know about Population Distribution Normally Distributed σ = 20 N = μ =?

Where to start? Rarely do we know/find out what the mean of a population is. But what would you use to estimate what you think μ is? A point estimator is a statistic that provides an estimate of a population parameter. The value of that statistic from a sample is called a point estimate.

For example Statistics Parameter Point estimator Point estimate μ x-bar The calculated sample mean σ s The standard deviation of the sample p p-hat The proportion of successes Any parameter The corresponding statistics

Example 1: In each of the following settings, determine the point estimator you would use and calculate the value of the point estimate. a. The makers of a new golf ball want to estimate the median distance the new balls will travel when hit by a mechanical driver. They select a random sample of 10 balls and measure the distance each ball travels after being hit by the mechanical driver. Here are the distances (in yards): 285 286 284 285 282 284 287 290 288 285 Point Estimator: The sample median to estimate the population median Point Estimate: The sample median is 285 yards

Example 1: In each of the following settings, determine the point estimator you would use and calculate the value of the point estimate. b. The golf ball manufacturer would also like to investigate the variability of the distance traveled by the golf balls by estimating the interquartile range. Point Estimator: The sample IQR as a point estimator for the population IQR Point Estimate: 3 yards (287-284)

Example 1: In each of the following settings, determine the point estimator you would use and calculate the value of the point estimate. c. The math department wants to know what proportion of its students own a graphing calculator, so they take a random sample of 100 students and find that 28 own a graphing calculator. Point Estimator: P-hat as a point estimator for the population proportion p Point Estimate: p-hat = 0.28

Back to the mystery mu Population Distribution Sampling Distribution Normally Distributed n = 16 σ = 20 Point estimator for μ = σx= μ =? The question is, how would the sample mean x vary if we took many SRSs of size 16 from this same population?

Definitions Confidence Interval A C% confidence interval gives an interval of plausible values for a parameter. The interval is calculated from the data and has the form, point estimate ± margin of error. Margin of Error The difference between the point estimate and the true parameter value will be less than the margin of error in C% of all samples. Confidence Level, C The confidence level C gives the overall success rate of the method for calculating the confidence interval. That is, in C% of all possible samples, the method would yield an interval that captures the true parameter value. How to interpret a confidence interval: To interpret a C% confidence interval for an unknown parameter, say, We are C% confident that the interval from to captures the [parameter in context].

Interpreting Intervals with Caution Rule #1: A confidence level tells how likely it is that the interval captures the population parameter if we use it many times. It is the overall capture rate. It does not tell us the chance that the interval captures the parameter. It provides a set of plausible values for the parameter.

Interpreting Confidence Intervals with Caution Rule #2: A confidence interval is NOT the probability that the parameter has been captured. Before it is calculated, we have a 95% chance (for example) of getting a mean that s within 2σ of μ, which would lead to a confidence interval that captures μ. After the confidence interval is constructed, it either does or does not contain μ, which corresponds to a probability of 100% (the interval contained μ) or 0% (the interval did not contain μ).

Interpreting Confidence Intervals with Caution Rule #3: When interpreting a confidence interval, make it clear that you are predicting a parameter, a population, not a statistic, not a sample. Yes: Based on the sample, we believe that the population mean is somewhere between Not so much: We are 95% confident that the interval from _ to _ contains the sample proportion

Interpreting Confidence Intervals with Caution Rule #4: Talk in the future tense, not in the past. No: We are 95% confident that the interval from to captures the true proportion of US adults who said Vs. Yes: We are 95% confident that the interval from to captures the true proportion of US adults who would say

Example 2 A large company is concerned that many of its employees are in poor physical condition, which can result in decreased productivity. To determine how many steps each employee takes per day, on average, the company provides a pedometer to 50 randomly selected employees to use for one 24-hour period. After collecting the data, the company statistician reports a 95% confidence interval of 4547 steps to 8473 steps. a. Interpret the confidence interval. We are 95% confident that the interval from 4547 to 8473 captures the true mean number of steps taken per day for employees at this company. b. What is the point estimate that was used to create the interval? What is the margin of error? Point estimate: 6510 steps (midpoint of the interval) Margin of Error: 1963 steps c. Recent guidelines suggest that people aim for 10,000 steps per day. Is there convincing evidence that the employees of this company are not meeting the guideline, on average? Explain. There is convincing evidence that the employees are not meeting the guideline because all of the values in the interval are less than 10,000 steps.

Let's explore confidence intervals

Example 3 How much does the fat content of Brand X hot dogs vary? To find out, researchers measured the fat content (in grams) of a random sample of 10 Brand X hot dogs. A 95% confidence interval for the population standard deviation σ is 2.84 to 7.55 a. Interpret the confidence interval. b. Interpret the confidence level. c. True or false: the interval from 2.84 to 7.55 has a 95% chance of containing the actual population standard deviation σ. Justify.

Example 3 Solutions a. Interpret the confidence interval. We are 95% confident that the interval from 2.84 to 7.55 g captures the population standard deviation of the fat content of Brand X hot dogs. b. Interpret the confidence level. Over the course of many repetitions, about 95% of all the confidence intervals would capture the true standard deviation of fat content of Brand X hot dogs. c. True or false: the interval from 2.84 to 7.55 has a 95% chance of containing the actual population standard deviation σ. Justify. False: the interval either does or does not contain the population standard deviation (a probability of 1 or 0, respectively).

Exploring Confidence Intervals Play around with the app, and be ready to summarize: 1. Explain how changing the confidence level affects the confidence interval. 2. Explain how changing the sample size affects the length of the confidence interval. 3. Does increasing the sample size increase the capture rate (percent hit)?

My two cents solution Play around with the app, and be ready to summarize: 1. Explain how changing the confidence level affects the confidence interval. Increasing the confidence level widens the confidence interval. Our interval of plausible values for the parameter depends on our level; the wider the interval, the less precise of an estimate, but the more likely that the true parameter will be captured. 2. Explain how changing the sample size affects the length of the confidence interval. The larger the sample size, the more precise estimate of a parameter. 3. Does increasing the sample size increase the capture rate (percent hit)? The sample size does not affect the capture rate. Increasing the sample size does NOT make us more confident, it just makes for a more precise estimate.

Calculating a confidence interval Generally, the confidence interval for estimating a population parameter has the form Statistic ± (critical value) (standard deviation of statistic) The critical value basically is the number of standard deviations that makes the interval wide enough to have the stated capture rate. The product of the critical value and standard deviation is the margin of error.

Margin of error The margin of error depends on 1. The critical value: The greater confidence requires a larger critical value. 2. The standard deviation: the standard deviation of the statistic depends on the sample size n: larger samples give more precise estimates, which means less variability in the statistic.