بسم االله الرحمن الرحيم COMPUTER & DATA ANALYSIS Theoretical Exam FINAL THEORETICAL EXAMINATION Monday 15 th, 2007 Instructor: Dr. Samir Safi Name: ID Number: Instructor: INSTRUCTIONS: 1. Write your name, student ID and section number. 2. You have TWO hours. 3. This exam must be your own work entirely. You cannot talk to or share information with anyone. PLEASE DON'T WRITE ON THIS TABLE Student's Points Question #1 #2 #3 #4 #5 #6 #7 Total
Point 1
Question #1: (20 Points) For each of the situations described below, state the statistical technique and the sample(s) type that you believe is the most applicable. Example: Two independent samples - t test. 1. As part of an attitude survey, a sample of men and women are asked to rate a number of statements on a scale of 1 to 5, according to whether they agree or disagree. We wish to determine whether there is a significant difference between the answers of men and women. Answer: _Two independent samples Mann Whitney test. 2. Investors use many "indicators" in their attempts to predict the behavior of the stock market. One of these is the "January indicator." Some investors believe that if the market is up in January, then it will be up for the rest of the year. We wish to determine if there is a relationship between the market's direction in January and the market's direction the rest of the year. Answer: Chi. 3. Bastien, Inc. has been manufacturing small automobiles that have averaged 50 miles per gallon of gasoline in highway driving. The company has developed a more efficient engine for its small cars and now advertises that its new small cars average more than 50 miles per gallon in highway driving. An independent testing service road-tested 25 of the automobiles. We wish to determine whether or not the manufacturer's advertising campaign is legitimate. Answer: One sample t - test. 4. The Anderson Company has sent two groups of employees to a privately run program providing word-processing training. One group was the data-processing department; the other was from the typing pool. At the completion of the program, the Anderson Company received a report showing the class rank for each of its employees. We wish to determine to see whether there is a performance difference between the two groups in the word-processing program. Answer: Ind Mann. 5. A credit company wants to see if there is any difference in the average amount owed by people under 30 years old and by people over 30 years. Independent random samples of five were taken from both age groups. It can be assumed that the population variances are the same. We wish to determine if there is a difference between the average amounts owed by the two age groups Answer: Ind. 2
6. A large corporation wants to determine whether or not the typing efficiency course given at a local college can increase the typing speeds of its word processing personnel. A sample of 6 typists is selected, and are sent to take the course. We wish to test to see if it can be concluded that taking the course will actually increase the average typing speeds of the typists. Answer: Paired. 7. The Excellent Drug Company claims its aspirin tablets will relieve headaches faster than any other aspirin on the market. To determine whether Excellent's claim is valid, random samples of size 15 are chosen from aspirins made by Excellent and the Simple Drug Company. An aspirin is given to each of the 30 randomly selected persons suffering from headaches and the number of minutes required for each to recover from the headache is recorded. We wish to determine whether Excellent's aspirin cures headaches significantly faster than Simple's aspirin. Answer: Ind. 8. An automobile manufacturer is trying to determine if 5 different types of bumpers differ in their reaction to low-speed collisions. An experiment was conducted where 40 bumpers of each of 5 different types were installed on midsize cars, which were then driven into a wall at 5 miles per hour. The cost of repairing the damage in each case was assessed. Answer: ANOVA. 9. Is marital status related to health in the elderly? To answer this question, two hundred elderly people whose marital status is known (single, married, widowed, or divorced) are rated as to whether they are in good, fair, or poor health. Is there evidence of a relationship? Answer: Chi. 10. One company hires employees for its management staff from three local colleges. The company's personnel has been collecting and reviewing annual performance ratings in an attempt to determine if there are differences in performance among the managers hired from these colleges. Performance-rating data are available from independent samples seven employees from college A, six employees from college B, and seven employees from college C. We wish to determine whether the three populations are identical with respect to performance evaluations. Answer: KW. 3
Question #2: (14 Points) The following data are metabolic expenditures (amount of energy expended by patients) for 8 patients admitted to a hospital for reasons other than trauma and for 8 patients admitted for trauma (multiple fractures). Using α =.01 and the SPSS output, give an interpretation for each of the following: Nontrauma 18.7 17.7 21.7 17.8 21.5 19.4 19.5 21.3 Trauma 25.1 38.4 35.9 26.4 25.7 20.3 21.4 24.7 (a) (3 Points) Examine the distribution of these scores. Does it seem normal? (b) (3 Points) A couple of values are much higher than the rest. Explain why outliers can cause a problem for t-analyses. (c) (6 Points) Carry out a two-sample t-test comparing the means of the populations. (d) (3 Points) Do the results of the Wilcoxon test and the usual t-test agree? 4
SPSS Output for question #2 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means Equal variances assumed Equal variances not assumed F Sig. t df Sig. (2-tailed) 7.046.019-3.181 14.007-3.181 7.874.013 10 Normal Q-Q Plot of ENERGY3 Expected Normal Value 0-10 -10 0 10 20 Observed Value ENERGY TRAUMA Nontrauma Trauma Total Ranks N Mean Rank Sum of Ranks 8 5.13 41.00 8 11.88 95.00 16 Mann-Whitney U Wilcoxon W Z Test Statistics b Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] a. Not corrected for ties. ENERGY 5.000 41.000-2.836.005.003 a b. Grouping Variable: TRAUMA 5
Question #3: (15 Points) A chain of convenience stores wanted to test three different advertising policies: Policy 1: No advertising. Policy 2: Advertise in neighborhoods with circulars. Policy 3: Use circulars and advertise in newspapers. Eighteen stores were randomly selected and divided randomly into three groups of six stores. Each group used one of the three policies. Following the implementation of the policies, sales figures were obtained for each of the stores during a 1-month period. Using the SPSS output, give an interpretation for each of the following: 1. (3 Points) Test of Homogeneity of Variances. 2. (6 Points) Explain the result of the ANOVA Table. 3. (6 Points) Discuss all the multiple comparisons. 6
SPSS Output for question #3 Test of Homogeneity of Variances DATA Levene Statistic df1 df2 Sig..841 2 15.451 ANOVA DATA Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. 115.111 2 57.556 8.534.003 101.167 15 6.744 216.278 17 Multiple Comparisons Dependent Variable: DATA Bonferroni (I) GROUP Policy 1 Policy 2 Policy 3 (J) GROUP Policy 2 Policy 3 Policy 1 Policy 3 Policy 1 Policy 2 Mean Difference 95% Confidence Interval (I-J) Std. Error Sig. Lower Bound Upper Bound -.6667 1.49938 1.000-4.7056 3.3723 *. The mean difference is significant at the.05 level. -5.6667* 1.49938.005-9.7056-1.6277.6667 1.49938 1.000-3.3723 4.7056-5.0000* 1.49938.014-9.0389 -.9611 5.6667* 1.49938.005 1.6277 9.7056 5.0000* 1.49938.014.9611 9.0389 7
Question #4: (12 Points) An experiment was conducted to evaluate the effectiveness of a treatment for tapeworm in the stomachs of sheep. A random sample of 24 worm- infected lambs of approximately the same age and health was randomly divided into two groups. Twelve of the lambs were injected with the drug and the remaining twelve were left untreated. After a 6- month period, the lambs were slaughtered and the following worm counts were recorded: Drug-Treated Sheep 18 43 28 50 16 32 13 35 38 33 6 7 Untreated Sheep 40 54 26 63 21 37 39 23 48 58 28 39 Using α =.05 and the SPSS output, give an interpretation for each of the following: a. (3 Points) What's the suitable statistical technique? b. (6 Points) Test whether the mean number of tapeworms in the stomachs of the treated lambs is less than the mean for untreated lambs. c. (3 Points) Place and interpret a 95% confidence interval on µ 1 µ 2 to assess the size of the difference in the two means. 8
SPSS Output for question #4 Group Statistics Worm counts CODING Drug-Treated Sheep Untreated Sheep Std. Error N Mean Std. Deviation Mean 12 26.5833 14.36193 4.14593 12 39.6667 13.85859 4.00063 Independent Samples Test Worm counts Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. t df Sig. (2-tailed) t-test for Equality of Means Mean Difference 95% Confidence Interval of the Std. Error Difference Difference Lower Upper.205.655-2.271 22.033-13.0833 5.76141-25.03176-1.13491-2.271 21.972.033-13.0833 5.76141-25.03264-1.13403 Paired Samples Statistics Pair 1 Drug-Treated Sheep Untreated Sheep Std. Error Mean N Std. Deviation Mean 26.5833 12 14.36193 4.14593 39.6667 12 13.85859 4.00063 Paired Samples Correlations Pair 1 Drug-Treated Sheep & Untreated Sheep N Correlation Sig. 12.583.046 Paired Samples Test Pair 1 Drug-Treated Sheep - Untreated Sheep Mean Paired Differences 95% Confidence Interval of the Std. Error Difference Std. Deviation Mean Lower Upper t df Sig. (2-tailed) -13.0833 12.88733 3.72025-21.2716-4.8951-3.517 11.005 9
Question #5:(11 Points) Many states are considering lowering the blood alcohol level at which a driver is designated as driving under the influence (DUI) of alcohol. An investigator for a legislative committee designed the following test to study the effect of alcohol on reaction time. Ten participants consumed a specified amount of alcohol. An-other group of ten participants consumed the same amount of a nonalcoholic drink, a placebo. The twenty participants' average reaction times (in seconds) to a series of simulated driving situations are reported in the following table. Does it appear that alcohol consumption increases reaction time? Placebo 0.90 0.37 4.63 0.83 0.95 0.78 0.86 0.61 0.38 1.97 Alcohol 1.46 1.45 1.76 1.44 1.11 3.07 0.98 1.27 2.56 1.32 Using α =.05 and the SPSS output, give an interpretation for each of the following: a. (3 Points) Why is the t test inappropriate for analyzing the data in this study? b. (6 Points) Use the Wilcoxon rank sum test to test the hypotheses: H 0 : The distributions of reaction times for the placebo and alcohol populations are identical. H 1 : The distribution of reaction times for the placebo consumption populations is shifted to the left of the distribution for the alcohol population. (Larger relation times are associated with the consumption of alcohol). c. (2 Points) Place 95% confidence intervals on the median reaction times for the two groups. 10
SPSS Output for question #5 3.5 3.0 6 2.5 9 2.0 10 1.5 3 1.0.5 0.0 N = 10 Placebo population 10 Alcohol population 1.5 Normal Q-Q Plot of Placebo population 1.5 Normal Q-Q Plot of Alcohol population 1.0 1.0.5.5 0.0 0.0 Expected Norm al -.5-1.0-1.5 0.0.5 Observed Value 1.0 1.5 2.0 E x p e c te d N o rm a l -.5-1.0-1.5.5 1.0 Observed Value 1.5 2.0 2.5 3.0 3.5 Group Statistics Blood-Alcohol Code: 1 Placebo, 2:Alcohol Placebo Alcohol Std. Error N Mean Std. Deviation Mean 10.9280.50868.16086 10 1.6420.66416.21003 11
Blood-Alcohol Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. Independent Samples Test t df Sig. (2-tailed) t-test for Equality of Means Mean Difference 95% Confidence Interval of the Std. Error Difference Difference Lower Upper.669.424-2.699 18.015 -.7140.26455-1.26980 -.15820-2.699 16.856.015 -.7140.26455-1.27251 -.15549 Ranks Blood-Alcohol Code: 1 Placebo, Placebo Alcohol Total N Mean Rank Sum of Ranks 10 7.00 70.00 10 14.00 140.00 20 Mann-Whitney U Wilcoxon W Z Test Statistics b Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] a. Not corrected for ties. Blood-Alcohol 15.000 70.000-2.646.008.007 a b. Grouping Variable: Code: 1 Placebo, 2:Alcohol 12
Question #6: (17 Points) A team of researchers wants to compare the yields (in pounds) of five different varieties (A, B, C, D, E) of 4- year- old orange trees in one orchard. They obtain a random sample of seven trees of each variety from the orchard. Using α =.01 and the SPSS output, give an interpretation for each of the following: a. (3 Points) Using tests and plots of the data, determine whether the conditions for using the ANOVA are satisfied. b. (6 Points) Conduct an ANOVA test of the null hypothesis that the five varieties have the same mean yield. c. (6 Points) Use the Kruskal-Wallis test to test the mull hypothesis that the five varieties have the same yield distributions. d. (2 Points) Are the conclusions you reached in (b) and (c) consistent? 13
SPSS Output for question #6 Yield (in pounds) Code (Yields) A B C D E *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Tests of Normality Kolmogorov-Smirnov a Shapiro-Wilk Statistic df Sig. Statistic df Sig..182 7.200*.915 7.428.227 7.200*.884 7.243.161 7.200*.958 7.804.239 7.200*.866 7.172.144 7.200*.985 7.980 50 40 30 20 Yield (in pounds) 10 0 N = 7 7 7 7 7 A B C D E Code (Yields) Test of Homogeneity of Variances Yield (in pounds) Levene Statistic df1 df2 Sig. 5.214 4 30.003 14
ANOVA Yield (in pounds) Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. 1096.743 4 274.186 3.730.014 2205.429 30 73.514 3302.171 34 Ranks Yield (in pounds) Code (Yields) A B C D E Total N Mean Rank 7 11.64 7 21.36 7 26.79 7 13.64 7 16.57 35 Test Statistics a,b Yield (in pounds) Chi-Square 10.011 df 4 Asymp. Sig..040 a. Kruskal Wallis Test b. Grouping Variable: Code (Yields) 15
Question #7: (11 Points) A personnel director for large, research- oriented firm categorizes colleges and graduates. The director collects data on 156 recent graduates, and has each rated supervisor. Rating School Outstanding Average Poor Most desirable 21 25 2 Good 20 35 10 Adequate 4 14 7 Undesirable 3 8 6 Using α =.01 and the SPSS output, give an interpretation for each of the following: a. (8 Points) Can the director safely conclude that there is a relation between school type and rating? b. (3 Points) Is there any problem in using the 2 χ approximation? 16
SPSS Output for question #7 SCHOOL * RATING Crosstabulation SCHOOL Total Most Desirable Good Adequate Undesirable Count Expected Count Count Expected Count Count Expected Count Count Expected Count Count Expected Count RATING Outstanding Average Poor Total 21 25 2 48 14.8 25.5 7.7 48.0 20 36 10 66 20.3 35.1 10.6 66.0 4 14 7 25 7.7 13.3 4.0 25.0 3 8 6 17 5.2 9.0 2.7 17.0 48 83 25 156 48.0 83.0 25.0 156.0 Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases Chi-Square Tests Asymp. Sig. Value df (2-sided) 15.967 a 6.014 16.577 6.011 13.934 1.000 156 a. 2 cells (16.7%) have expected count less than 5. The minimum expected count is 2.72. 17