Statistics Assignment 11 - Solutions

Statistics 44.3 Assignment 11 - Solutions 1. Samples were taken of individuals with each blood type to see if the average white blood cell count differed among types. Eleven individuals in each group were sampled. The results are given in the table below: Average White Blood Cell count by Blood Type 5000 7000 700 5550 5550 7500 7770 6570 6000 8500 8600 760 6500 5000 6000 5900 8000 6100 5950 7100 7700 700 7540 6980 10000 9900 11000 8750 6100 6400 600 7700 700 7300 7000 8100 5500 5800 6100 4900 9000 8950 7800 5800 Σx 76550 79650 81160 74970 Σx 55764500 59755500 60917000 55084700 a. Construct an ANOVA Table for testing the equality of average white blood cell counts among blood types. T x = 301196700, = 194681.8, G = 31330, N = 44 4 11 4 i i= 1 j= 1 i= 1 ni The ANOVA table Source S.S. d.f. M.S. F Between 178570.455 3 76190.15 0.354 Within 8197018.18 40 049300.5 Total 84150588.64 43 Page 1

b. State conclusions and plot appropriate graphs to illustrate the results. Conclusions: Since F = 0.354 < F 0.05 =,84 (df 1 = 3, df = 40), we conclude that there is no significant difference in average white cell count among the four blood types (A, B, AB, O). Table: mean white cell count for blood types (A, B, AB, O). 6959.09 740.91 7378.18 6815.45 Figure: mean white cell count for blood types (A, B, AB, O). 8000 6000 4000 000 0 An alternative way of illustrating the results is through box- plots White Blood Cell Count 1000 11000 10000 9000 COUNT 8000 7000 6000 5000 4000 TYPE Non-Outlier Max Non-Outlier Min Median; 75% 5% Outliers

. Researchers studied the association between birth mother s smoking habits and the birth weights of their babies. A sample of size n = 11 subjects was selected from each of the four groups. Group 1 nonsmokers Group smokers who smoked less that 1 pack per day Group 3 smokers who smoke more than 1 Pack but less than packs per day Group 4 smokers who smoke more than packs per day. The data is tabulated below: Table: Birth weights (in grams) of infants of mothers (n = 11) in four smoking groups 1 pack to nonsmokers < 1pack packs > packs 3510 3444 608 3 3174 3111 555 331 3580 890 3100 00 33 300 1775 11 3884 995 985 001 398 3101 479 1566 4055 3400 901 1676 3459 3764 778 1783 3998 997 099 00 385 3031 500 118 341 310 3 188 T i, Σx 40147 34855 810 191 Σx 14753135 11110853 7336550 4434300 x i 3649.73 3168.64 554.73 199.00 Use the above data to construct an ANOVA table to determine if there is a significant difference in the average birth weight amongst the four groups. Illustrate your findings graphically T x = 3761938, = 37410071.09, G = 15016, N = 44 4 11 4 i i= 1 j= 1 i= 1 ni The ANOVA table Source S.S. d.f. M.S. F Between 1705519.8 3 5735173.3 60.35 Within 380866.909 40 95071.673 Total 1008386.73 43 Conclusions: Since F = 60.35 > F 0.05 =,84 (df 1 = 3, df = 40), we conclude that there is a significant difference in average birth weight among the four Smoking groups.

4000 3000 000 1000 0 nonsmokers < 1pack 1 to packs > packs This graph indicates that average birth weight decreases as the level of smoking increases. 3. In the following study the investigator was interested in determining if the Presence of Heart Disease was related to Systolic Blood pressure. The study consisted of four groups of subjects with differing levels of Systolic Blood pressure (<17, 17-146, 147-166, 167+). The data is tabulated below: Coronary Systolic Blood pressure (mm Hg) Heart Disease <17 17-146 147-166 167+ Total Present 0 8 0 4 9 Absent 388 57 04 118 137 Total 408 555 4 14 139 Determine if there is a relationship between the Presence of Heart Disease and Systolic Blood pressure. Expected frequencies Coronary Systolic Blood pressure (mm Hg) Heart Disease <17 17-146 147-166 167+ Total Present 8.44 38.40 15.506 9.830 9 Absent 379.756 516.580 08.494 13.170 137 Total 408 555 4 14 139

Standardized residuals r x = E E Coronary Systolic Blood pressure (mm Hg) Heart Disease <17 17-146 147-166 167+ Present -1.551-1.681 1.141 4.50 Absent 0.43 0.458-0.311-1.33 ( x E ) r χ = = = 8.966 i j i j E Since χ > χ 0.05 = 7.815 for ( 3)( 1) = 3 df. The Null hypothesis of independence is rejected. Examining the standardized residuals, we see there is higher incidence of heart disease when BP is 167+ then one would expect if the two variables were independent. This is illustrated with the following graph. Presence of Heart Disease 18.00% 16.00% 14.00% 1.00% 10.00% % 8.00% 6.00% 4.00%.00% 0.00% < 17 17 to 146 147 to 166 167 + Systolic BP 4. A study of reading errors made by second grade pupils was carried out to help decide whether the use of different sorts of drills for pupils of different reading abilities was warranted. Errors were categorized as follows. DK: Did not know the word at all C: Substitution of a word of similar configuration (e.g. "bad" for "had") T: Substitution of a synonym suggested by the context. OS: Other substitution.

The students had been clustered into three relatively homogeneous reading groups on the basis of (1) their reading achievement scores at the end of first grade and () their verbal IQ's. Group A consisted of the least able readers, Group C consisted of the most able readers while Group B made up the middle group. Five children were in Group A, while nine children were in Group B and eleven children were in Group C. The number of errors of each type made by each child were added to obtain the group totals given below: Analyze this data. Group A T 5 C 10 OS 15 DK 53 Total 83 Group B 8 34 7 17 306 Group C 8 10 15 36 69 Total 41 54 10 61 458 Expected frequencies T C OS DK Total Group A 7.430 9.786 18.485 47.99 83 Group B 7.393 36.079 68.148 174.380 306 Group C 6.177 8.135 15.367 39.31 69 Total 41 54 10 61 458 Standardized residuals r x = E E T C OS DK Group A -0.89 0.068-0.811 0.89 Group B 0.116-0.346 0.467-0.180 Group C 0.734 0.654-0.094-0.530 ( x E ) r χ = = = 3.7816 i j i j E Since χ < χ 0.05 =.59 for ( 3)( ) = 6 df. The Null hypothesis of independence is accepted. This test is not whether one group makes more errors on the average than other groups. The chi-square test is testing if the proportion of times a student makes one type of error (T, C, OS or DK) differs amongst the three groups (A-least able, B- middle, C-most able). The test concludes that there is no significant difference.