1 6 Samplig Distributios ad Cofidece Itervals Iferetial statistics to make coclusios about a large set of data called the populatio, based o a subset of the data, called the sample. 6.1 Samplig Distributios Samplig distributios, whe combied with the probability ad probability distributio cocepts of the previous two chapters, provides us with the theoretical justificatios that eable to arrive at coclusios about a etire populatio based oly o a sigle sample. Statistic is a umerical measure of a sample (mea, variace, std or proportio). Parameter is umerical measure of a populatio. Samplig Distributio Distributio of a sample statistic, such as the mea or the proportio, for all possible samples of a give size. Commo behaviors are, Every sample statistic has a samplig distributio. A specific sample statistic is used to estimate its correspodig populatio characteristic. Each sample statistic of iterest is associated with a specific samplig distributio. Some commoly used statistics Measure Statistic Parameter Mea x Variace s 2 2 Stadard deviatio s Proportio p Samplig Distributio of the Mea ad the Cetral Limit Theorem The mea is the most widely used measure i statistics but a idividual extreme value ca distort the mea. To overcome this statisticias have developed the cetral limit theorem. This theorem states that Regardless of the shape of the distributio of the idividual values i the populatio, as the sample size gets large eough, the samplig distributio of the mea ca be approximated by a ormal distributio. Large eough is accepted as 30 ad higher by statisticias. However, you ca apply the cetral limit theorem for smaller sample sizes if the populatio distributio is kow ormally distributed. Samplig Distributio of the Proportio We use biomial distributio to determie probabilities for categorical variables that have oly two categories, traditioally labeled success ad failure. The ormal distributio ca be used to approximate the biomial distributio whe the umber of successes ad the umber of failures are each at least five.
2 6.2 Samplig Error ad Cofidece Itervals Takig oe sample ad computig the results of a sample statistic, such as the mea, creates a poit estimate of the populatio parameter. This sigle estimate will almost certaily be differet if aother sample is selected. For example, cosider the followig table that records the results of takig 20 samples of = 15 selected from a populatio of N = 200 order-fillig times. This populatio has a populatio mea = 69.637 ad a populatio stadard deviatio = 10.411. Sample Mea Stadard Deviatio Miimum Media Maximum Rage 1 66.12 9.21 47.20 65.00 87.00 39.80 2 73.30 12.48 52.40 71.10 101.10 48.70 3 68.67 10.78 54.00 69.10 85.40 31.40 4 69.95 10.57 54.50 68.00 87.80 33.30 5 73.27 13.56 54.40 71.80 101.10 46.70 6 69.27 10.04 50.10 70.30 85.70 35.60 7 66.75 9.38 52.40 67.30 82.60 30.20 8 68.72 7.62 54.50 68.80 81.50 27.00 9 72.42 9.97 50.10 71.90 88.90 38.80 10 69.25 10.68 51.10 66.50 85.40 34.30 11 72.56 10.60 60.20 69.10 101.10 40.90 12 69.48 11.67 49.10 69.40 97.70 48.60 13 64.65 9.71 47.10 64.10 78.50 31.40 14 68.85 14.42 46.80 69.40 88.10 41.30 15 67.91 8.34 52.40 69.40 79.60 27.20 16 66.22 10.18 51.00 66.40 85.40 34.40 17 68.17 8.18 54.20 66.50 86.10 31.90 18 68.73 8.50 57.70 66.10 84.40 26.70 19 68.57 11.08 47.10 70.40 82.60 35.50 20 75.80 12.49 56.70 77.10 101.10 44.40 From these results, you ca observe the followig: The sample statistics differ from sample to sample. The sample meas vary from 64.65 to 75.80, the sample stadard deviatios vary from 7.62 to 14.42, the sample medias vary from 64.10 to 77.10, ad the sample rages vary from 26.70 to 48.70. Some of the sample meas are higher tha the populatio mea of 69.637, ad some of the sample meas are lower tha the populatio mea. Some of the sample stadard deviatios are higher tha the populatio stadard deviatio of 10.411, ad some of the sample stadard deviatios are lower tha the populatio stadard deviatio. The variatio i the sample rage from sample to sample is much greater tha the variatio i the sample stadard deviatio. Samplig Error The variatio that occurs due to selectig a sigle sample from the populatio. The size of the samplig error is primarily based o the variatio i the populatio itself ad o the size of the sample selected. Larger samples will have less samplig error, but will be more costly to take. I practice, oly oe sample is used as the basis for estimatig a populatio parameter. To accout for the differeces i the results from sample to sample, statisticias have developed the cocept of a cofidece iterval estimate, which idicates the likelihood that a stated iterval with a lower ad upper limit properly estimates the parameter. Cofidece Iterval Estimate A estimate of a populatio parameter stated as a rage with a lower ad upper limit with a specific degree of certaity. There is a trade-off betwee the level of cofidece ad the width of the iterval. For a give sample size, if you wat more cofidece that your iterval will be correct, you will have a wider iterval ad therefore a less precise estimate. The most commo percetage used is 95%. If more cofidece is eeded, 99% is typically used; if less cofidece is eeded, 90% is typically used. Because of this factor, the degree of certaity, or cofidece, must always be stated whe reportig a iterval estimate. Whe you hear a iterval estimate with 95% cofidece, or simply, a 95% cofidece iterval estimate, you ca coclude that if all possible samples of the same size were selected, 95% of them would iclude the populatio parameter somewhere withi the iterval ad 5% would ot.
3 WORKED-OUT PROBLEM 1 You wat to develop 95% cofidece iterval estimates for the mea from 20 samples of size 15 for the order-fillig data preseted o previous page. Ulike most real-life problems, the populatio mea, = 69.637, ad the populatio stadard deviatio, = 10.411, are already kow, calculate the %95 cofidece iterval estimate for the mea developed for the populatio mea. Solutio Cofidece iterval estimate: X Z /2 σ where x is the poit estimate, Z α/2 is the ormal distributio critical value for a probability of /2 i each tail, σ is the stadard error. Commoly used cofidece levels are 90%, 95%, ad 99% x -Z /2 σ x+z /2 σ 69.637-1.96 10.411 / 15 69.637 + 1.96 10.411 / 15 64.37 74.9 So (64.37, 74.9) cotais 95% of the sample meas. The sample 20 with a mea of 75.80 is ot i the cofidece iterval. WORKED-OUT PROBLEM 2 Populatio has a mea of µ = 368 ad stadard deviatio σ = 15. If you take a sample of size = 25, what is the %95 cofidece iterval estimate for the mea? Solutio : (362.12, 373.88) If the populatio mea, is ukow to make the estimate, sample mea x, ca be used as populatio mea. WORKED-OUT PROBLEM 3 A sample of 11 circuits from a large ormal populatio has a mea resistace of 2.20 ohms. We kow from past testig that the populatio stadard deviatio is 0.35 ohms. Determie a 95% cofidece iterval for the true mea resistace of the populatio. Solutio : 1.9932 2.4068 We are 95% cofidet that the true mea resistace is betwee 1.9932 ad 2.4068 ohms Although the true mea may or may ot be i this iterval, 95% of itervals formed i this maer will cotai the true mea
4 6.3 Cofidece Iterval Estimate for the Mea Usig the t Distributio ( Ukow) The most commo cofidece iterval estimate ivolves estimatig the mea of a populatio. I virtually all cases, the populatio mea is estimated from sample data i which oly the sample mea ad sample stadard deviatio ad ot the populatio stadard deviatio are kow. To overcome this complicatio, statisticias have developed the t distributio. t Distributio The samplig distributio that allows you to develop a cofidece iterval estimate of the mea usig the sample stadard deviatio. The t distributio assumes that the variable beig studied is ormally distributed. x -t /2,df s x+t /2,df s df meas degrees of freedom ad calculated as -1. (Page 316) Degrees of 0.25 0.10 0.05 0.025 0.01 0.005 Freedom 1 1.0000 3.0777 6.3138 12.7062 31.8207 63.6574 2 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248 3 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409 4 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041 5 0.7267 1.4759 2.0150 2.5706 3.3649 4.0322 6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074 15 0.6912 1.3406 1.7531 2.1315 2.6025 2.9467 20 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453 50 0.6793 1.2984 1.6753 2.0076 2.4017 2.6757 WORKED-OUT PROBLEM 4 You wat to udertake a study that compares the cost for a restaurat meal i a major city to the cost of a similar meal i the suburbs outside the city. You collect data about the cost of a meal per perso from a sample of 50 city restaurats ad 50 suburba restaurats as follows: Calculate the cofidece itervals for the costs. Solutio: City Cost Data 13 21 22 22 24 25 26 26 26 26 30 32 33 34 34 35 35 35 35 36 37 37 39 39 39 40 41 41 41 42 43 44 45 46 50 50 51 51 53 53 53 55 57 61 62 62 62 66 68 75 Suburba Cost Data 21 22 25 25 26 26 27 27 28 28 28 29 31 32 32 35 35 36 37 37 37 38 38 38 39 40 40 41 41 41 42 42 43 44 47 47 47 48 50 50 50 50 50 51 52 53 58 62 65 67 x -t /2,df s x+t /2,df s For city cost; 41.46-2.0096 13.88 50 41.46+2.0096 13.88 50 37.51 45.41 For suburba cost;. 36.8 43.12
5 WORKED-OUT PROBLEM 5 A radom sample of = 25 has X = 50 ad S = 8. Form a 95% cofidece iterval for μ. Solutio: 46.698 μ 53.302 6.4 Cofidece Iterval Estimatio for Categorical Variables For a categorical variable, you ca develop a cofidece iterval to estimate the proportio of successes i a give category. Cofidece Iterval Estimatio for the Proportio ( ) The sample statistic p follows a biomial distributio that ca be approximated by the ormal distributio for most studies. This type of cofidece iterval estimate uses the sample proportio of successes, p (the umber of successes divided by the sample size), to estimate the populatio proportio. The distributio of the sample proportio is approximately ormal if the sample size is large, with stadard deviatio σ p (1 ) We will estimate this with sample data: X umberof itemsi thesamplehavigthecharacteri stic of iterest p(1 p) p 0 p 1 samplesize Firstly stadardize p to a Z value with the formula: p Z σ p p (1 ) Upper ad lower cofidece limits for the populatio proportio are calculated with the formula p(1 p) p Z /2 p Z /2 where Z α/2 is the stadard ormal value for the level of cofidece desired p is the sample proportio is the sample size (Note: must have p > 5 ad (1-p) > 5) p(1 p) WORKED-OUT PROBLEM 6 A radom sample of 100 people shows that 25 are left-haded. Form a 95% cofidece iterval for the true proportio of left-haders. Solutio: Z p /2 p(1 p)/ 25/100 1.96 0.25(0.75) /100 0.25 1.96(0.0433) 0.1651 0.3349 We are 95% cofidet that the true percetage of left-haders i the populatio is betwee 16.51% ad 33.49%. Although the iterval from 0.1651 to 0.3349 may or may ot cotai the true proportio, 95% of itervals formed from samples of size 100 i this maer will cotai the true proportio.
6 WORKED-OUT PROBLEM 7 You wat to estimate the proportio of people who take work with them o vacatio. I a recet survey by CareerJoural.com (data extracted from P. Kitche, Ca t Tur It Off, Newsday, October 20, 2006, pp. F4 F5), 158 of 473 employees respoded that they typically took work with them o vacatio. Form a 95% cofidece iterval for the true proportio. Solutio: Z p /2 p(1 p)/ 0.334 1.96 00.334(0.66)/473 0.334 1.96(0.022) 0.291 0.377 Based o the 95% cofidece iterval estimate prepared i Microsoft Excel for the proportio of people who take work with them o vacatio, you estimate that betwee 29.15% ad 37.65% of people take work with them o vacatio. WORKED-OUT PROBLEM 8 If the true proportio of voters who support Propositio A is π = 0.4, what is the probability that a sample of size 200 yields a sample proportio betwee 0.40 ad 0.45? Solutio: (1 ) 0.4(1 0.4) σ p 0.03464 200 0.40 0.40 0.45 0.40 P(0.40 p 0.45) P Z 0.03464 0.03464 P(0 Z 1.44) WORKED-OUT PROBLEM 9 If π = 0.4 ad = 200, what is P(0.40 p 0.45)? Solutio: Use stadardized ormal table: P(0 Z 1.44) = 0.4251 Test Yourself Short Aswers 1. The samplig distributio of the mea ca be approximated by the ormal distributio: (a) as the umber of samples gets large eough (b) as the sample size (umber of observatios i each sample) gets large eough (c) as the size of the populatio stadard deviatio icreases (d) as the size of the sample stadard deviatio decreases 2. The samplig distributio of the mea requires sample size to reach a ormal distributio if the populatio is skewed tha if the populatio is symmetrical. (a) the same (b) a smaller (c) a larger (d) The two distributios caot be compared. 3. Which of the followig is true regardig the samplig distributio of the mea for a large sample size? (a) It has the same shape ad mea as the populatio. (b) It has a ormal distributio with the same mea as the populatio. (c) It has a ormal distributio with a differet mea from the populatio. 4. For samples of = 30, for most populatios, the samplig distributio of the mea will be approximately ormally distributed: (a) regardless of the shape of the populatio (b) if the shape of the populatio is symmetrical (c) if the stadard deviatio of the mea is kow (d) if the populatio is ormally distributed 5. For samples of = 1, the samplig distributio of the mea will be ormally distributed: (a) regardless of the shape of the populatio (b) if the shape of the populatio is symmetrical (c) if the stadard deviatio of the mea is kow (d) if the populatio is ormally distributed 6. A 99% cofidece iterval estimate ca be iterpreted to mea that: (a) If all possible samples are take ad cofidece iterval estimates are developed, 99% of them would iclude the true populatio mea somewhere withi their iterval. (b) You have 99% cofidece that you have selected a sample whose iterval does iclude the populatio mea. (c) Both a ad b are true. (d) Neither a or b is true.
7 7. Which of the followig statemets is false? (a) There is a differet critical value for each level of alpha ( ). (b) Alpha ( ) is the proportio i the tails of the distributio that is outside the cofidece iterval. (c) You ca costruct a 100% cofidece iterval estimate of. (d) I practice, the populatio mea is the ukow quatity that is to be estimated. 8. Samplig distributios describe the distributio of: (a) parameters (b) statistics (c) both parameters ad statistics (d) either parameters or statistics 9. I the costructio of cofidece itervals, if all other quatities are uchaged, a icrease i the sample size will lead to a iterval. (a) arrower (b) wider (c) less sigificat (d) the same 10. As a aid to the establishmet of persoel requiremets, the maager of a bak wats to estimate the mea umber of people who arrive at the bak durig the two-hour luch period from 12 oo to 2 p.m. The director radomly selects 64 differet twohour luch periods from 12 oo to 2 p. m. ad determies the umber of people who arrive for each. For this sample, = 49.8 ad S = 5. Which of the followig assumptios is ecessary i order for a cofidece iterval to be valid? (a) The populatio sampled from has a approximate ormal distributio. (b) The populatio sampled from has a approximate t distributio. (c) The mea of the sample equals the mea of the populatio. (d) Noe of these assumptios are ecessary. (c) You are 95% cofidet that betwee 52% ad 66% of the sampled studets are plaig to atted graduate school. (d) You are 95% cofidet that 59% of the studets are plaig to atted graduate school. 12. I estimatig the populatio mea with the populatio stadard deviatio ukow, if the sample size is 12, there will be degrees of freedom. 13. The Cetral Limit Theorem is importat i statistics because (a) It states that the populatio will always be approximately ormally distributed. (b) It states that the samplig distributio of the sample mea is approximately ormally distributed for a large sample size regardless of the shape of the populatio. (c) It states that the samplig distributio of the sample mea is approximately ormally distributed for ay populatio regardless of the sample size. (d) For ay sized sample, it says the samplig distributio of the sample mea is approximately ormal. 14. For samples of = 15, the samplig distributio of the mea will be ormally distributed: (a) regardless of the shape of the populatio (b) if the shape of the populatio is symmetrical (c) if the stadard deviatio of the mea is kow (d) if the populatio is ormally distributed Aswer True or False: 15. Other thigs beig equal, as the cofidece level for a cofidece iterval icreases, the width of the iterval icreases. 16. As the sample size icreases, the effect of a extreme value o the sample mea becomes smaller. 17. A samplig distributio is defied as the probability distributio of possible sample sizes that ca be observed from a give populatio. 11. A uiversity dea is iterested i determiig the proportio of studets who are plaig to atted graduate school. Rather tha examie the records for all studets, the dea radomly selects 200 studets ad fids that 118 of them are plaig to atted graduate school. The 95% cofidece iterval for p is 0.59 ± 0.07. Iterpret this iterval. (a) You are 95% cofidet that the true proportio of all studets plaig to atted graduate school is betwee 0.52 ad 0.66. (b) There is a 95% chace of selectig a sample that fids that betwee 52% ad 66% of the studets are plaig to atted graduate school. 18. The t distributio is used to costruct cofidece itervals for the populatio mea whe the populatio stadard deviatio is ukow. 19. I the costructio of cofidece itervals, if all other quatities are uchaged, a icrease i the sample size will lead to a wider iterval. 20. The cofidece iterval estimate that is costructed will always correctly estimate the populatio parameter. Problems 1. A radom sample of 90 observatios produced a mea x =25.9 ad a stadard deviatio s = 2.7. a. Fid a 95% cofidece iterval for the populatio mea. b. Fid a 90% cofidece iterval for. c. Fid a 99% cofidece iterval for.
8 d. What happes to the width of a cofidece iterval as the value of the cofidece coefficiet is icreased while the sample size is held fixed? e. Would your cofidece itervals of parts a-c be valid if the distributio of the origial populatio was ot ormal? Explai 2. The followig radom sample was selected from a ormal distributio: 4,6,3,5,9,3. a. Costruct a 90% cofidece iterval for the populatio mea. b. Costruct a 95% cofidece iterval for the populatio mea. 3. Periodically, the Hillsborough Couty (Florida) Water Departmet tests the drikig water of homeowers for cotamiats such as lead ad copper. The lead ad copper levels i water specimes collected i 1998 for a sample of 10 residets of the Crystal Lakes Maors subdivisio are show i the ext colum. a. Costruct a 99% cofidece iterval for the mea lead level i water specimes from Crystal Lake Maors. b. Costruct a 99% cofidece iterval for the mea copper level i water specimes from Crystal Lake Maors. c. Iterpret the itervals, parts a ad b, i the words of the problem. 4. A radom sample of 50 cosumers taste tested a ew sack food. Their resposes were coded (0: do ot like; 1: like; 2: idifferet) ad recorded as follows: a. Use a 80% cofidece iterval to estimate the proportio of cosumers who like the sack food. b. Provide a statistical iterpretatio for the cofidece iterval you costructed i part a. 5. By law, all ew cars must be equipped with both driver side ad passeger-side safety air bags. There is cocer, however, over whether air bags pose a dager for childre sittig o the passeger side. I a Natioal Highway Traffic Safety Admiistratio (NHTSA) study of 55 people killed by the explosive force of air bags, 35were childre seated o the frot-passeger side (Wall Street Joural, Ja. 22, 1997). This study led some car owers with childre to discoect the passeger-side air bag. Cosider all fatal automobile accidets i which it is determied that air bags were the cause of death. Let p represet the true proportio of these accidets ivolvig childre seated o the frot-passeger side. a. Use the data from the NHTSA study to estimate p. b. Costruct a 99% cofidece iterval for p. c. Iterpret the iterval, part b, i the words of the problem. 6. The accoutig firm of Price Waterhouse aually moitors the U.S. Postal Service's performace. Oe parameter of iterest is the percetage of mail delivered o time. I a sample of 332,000 items mailed betwee Dec. 10 ad Mar. 3-the most difficult delivery seaso due to bad weather ad holidays-price Waterhouse determied that 282,200 items were delivered o time (Tampa Tribue, Mar. 26,1995). Use this iformatio to estimate with 99% cofidece the true percetage of items delivered o time by the US. Postal Service. Iterpret the result. 7. The data i the file MoviePrices cotai the price for two tickets with olie service charges, large popcor, ad two medium soft driks at a sample of six Ftheatre chais: $36.15 $31.00 $35.05 $40.25 $33.75 $43.00 Costruct a 95% cofidece iterval estimate of the populatio mea price for two tickets with olie service charges, large popcor, ad two medium soft driks. 8. Tua sushi was purchased from 13 Mahatta restaurats ad tested for mercury. The umber of pieces it would take to reach what the Evirometal Protectio Agecy cosiders to be a acceptable level to be regularly cosumed was as follows: 8.6 2.6 1.6 5.2 7.7 4.7 6.4 6.2 3.6 4.9 9.9 3.3 4.1
9 Costruct a 95% cofidece iterval estimate of the populatio mea umber of pieces it would take to reach what the Evirometal Protectio Agecy cosiders to be a acceptable level to be regularly cosumed. 9. The followig data represet the viscosity (frictio, as i automobile oil) take from 120 maufacturig batches (ordered from lowest viscosity to highest viscosity). 12.6 12.8 13.0 13.1 13.3 13.3 13.4 13.5 13.6 13.7 13.7 13.7 13.8 13.8 13.9 13.9 14.0 14.0 14.0 14.1 14.1 14.1 14.2 14.2 14.2 14.3 14.3 14.3 14.3 14.3 14.3 14.4 14.4 14.4 14.4 14.4 14.4 14.4 14.4 14.5 14.5 14.5 14.5 14.5 14.5 14.6 14.6 14.6 14.7 14.7 14.8 14.8 14.8 14.8 14.9 14.9 14.9 14.9 14.9 14.9 14.9 15.0 15.0 15.0 15.0 15.1 15.1 15.1 15.1 15.2 15.2 15.2 15.2 15.2 15.2 15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4 15.4 15.5 15.5 15.6 15.6 15.6 15.6 15.6 15.7 15.7 15.7 15.8 15.8 15.9 15.9 16.0 16.0 16.0 16.0 16.1 16.1 16.1 16.2 16.3 16.4 16.4 16.5 16.5 16.6 16.8 16.9 16.9 17.0 17.6 18.6 a. Costruct a 95% cofidece iterval estimate of the populatio mea viscosity. b. Do you eed to assume that the populatio viscosity is ormally distributed to costruct the cofidece iterval estimate for the mea viscosity? Explai. 10. I a survey of 1,000 airlie travelers, 760 respoded that the airlie fee that is most ureasoable was additioal charges to redeem poits/miles (extracted from Sapshots: Which Airlie Fee Is Most Ureasoable?, USA Today, December 2, 2008, p. B1). Costruct a 95% cofidece iterval estimate of the populatio proportio of airlie travelers who thik that the airlie fee that is most ureasoable was additioal charges to redeem poits/miles. 11. I a survey of 2,395 adults, 1,916 reported that emails are easy to misiterpret, but oly 1,269 reported that telephoe coversatios are easy to misiterpret (extracted from Sapshots: Ope to Misiterpretatio, USA Today, July 17, 2007, p. 1D). a. Costruct a 95% cofidece iterval estimate of the populatio proportio of adults who report that emails are easy to misiterpret. b. Costruct a 95% cofidece iterval estimate of the populatio proportio of adults who report that telephoe coversatios are easy to misiterpret. c. Compare the results of (a) ad (b). 12. You wat to estimate the proportio of ewspapers prited that have a ocoformig attribute, such as excessive ruboff, improper page setup, missig pages, or duplicate pages. A radom sample of = 200 ewspapers is selected from all the ewspapers prited durig a sigle day. I this sample, 35 cotai some type of ocoformace. Costruct a 90% cofidece iterval for the proportio of ewspapers prited durig the day that have a ocoformig attribute.