International Statistical Literacy Competition of the ISLP http://www.stat.auckland.ac.nz/~iase/islp/competition Training package 3 1.- Drinking Soda and bone Health http://figurethis.org/ 1
2
2.- Comparing Archaeological Sites 3
4
3.- Estimating Chances of Winning given some information 5
6
4.- How many fish? 7
8
5.- Language 9
10
6.- How much time do Teens spend on the job? 11
12
7.- Sleeping Time How much time do students in your school sleep at night? Do girls show a significantly different pattern of sleep than boys? What do you think would explain a difference if there was one? Do a survey of your class and record for each student the following variables: Sleep= sleep time last night in hours Sex = Boy or girl Mystery= the variable that you think might explain a difference if it exists. Determine with a graph and the relevant summary statistics whether there is a difference. Is there a significant difference? Could you extrapolate to other boys and girls in the larger world out there your conclusions for the class? Were you right in your hypothesis about the reason for a difference, if there is any? To Teacher This is a very open ended question, where the student has to prepare a questionnaire, do the survey, collect the data, realize that measures of central tendency and variability have to be used to compare the two groups. They have to decide how they are going to tabulate their data set, how to summarize variable by variable and how to show a relationship sexsleeping. Box plots, histograms and other creative graphs can be used to compare. Make the students write a summary of what they found. 13
8.- From CensusAtSchool New Zealand http://www.censusatschool.org.nz/ Information provided in the next two pages. 14
15
16
17
TEACHER NOTES PART B 18
9. The Effectiveness of Captopril 1. Introduction Pharmaceutical manufacturers go through a very rigorous process in order to get their drugs approved for sale. The process is designed to determine whether or not the drug works. There are a variety of factors that make this more difficult than it might seem at first blush. One factor is that different people have different reactions to the same drug. So it s not true that a drug works or doesn t work. In truth, it will have a different effect on different people and therefore, drug manufacturers have to convince the government that it works, on average, loosely speaking. In this activity, we will examine a study designed to study the effectiveness of the drug Captopril to lower blood pressure. We will focus only on Captopril s effect on systolic blood pressure. The study, reported in the British Medicine Journal in 1979, examined 15 patients. Each patient had his or her blood pressure measured, was given the drug, and then had their blood pressure taken again several minutes later. The fifteen patients were given equal dosage of the drug. 2.- The Data In order to test the effectiveness of the drug Captopril to lower blood pressure we need to have the measurements of the blood pressure before the patients took the drug, and the blood pressure after taking the drug. The data set containing this information is given below. For each patient, the variable called before represents the systolic blood pressure before Captopril was administered, while the variable called after represents the systolic blood pressure after Captopril was administered. Patient before after 1 130 125 2 122 121 3 124 121 4 104 106 5 112 101 6 101 85 7 121 98 8 124 105 9 115 103 10 102 98 11 98 90 12 119 98 13 106 110 14 107 103 15 100 82 19
Question 1: What type of variables are before and after? Question 2: Describe the distribution of blood pressures of the sample before taking the drug. Support your answer with the summary statistics you get and the plots. Refer both to spread and typical values. Use summary statistics, a histogram and a box plot to do that. In view of the distribution and the summary statistics you get, which statistics do you think are the most appropriate to summarize the distribution Question 3. Describe the distribution of blood pressures of the sample after taking the drug. Support your answer with the summary statistics you get and the plots. Refer both to spread and typical values. Use summary statistics, a histogram and a box plot to do that. In view of the distribution and the summary statistics you get, which statistics do you think are the most appropriate to summarize the distribution? Copy-paste your histogram, your box plot and your summary statistics and write Question 4. Our research question is whether Captopril is effective in lowering systolic blood pressure. So what we really need is to examine both before and after simultaneously. Do comparative box plots and comparative summary statistics, and determine whether Captopril was effective in lowering blood pressure. Explain your answer and support it with the summary statistics and the box plots. Question 5.- Do you think all the participants changed by the same amount? If so, explain. If not, which patients do you think changed the most? Which changed the least? Do you think it s possible that anybody s blood pressure increased? Can you answer these questions with any of the graphs we have used? Question 7.- So far we have looked at descriptions of the blood pressures before taking the drug separately from the blood pressures after. But it would be nice to know who in the group changed, and in which fashion. These questions can be answered with a little more effort. These data are what we call paired. Every individual that contributes a value in the first variable also contributes an observation in the second variable. We can focus our investigation on the change. So we will create a new variable called difference (difference=bp after BP before) and look at its histogram. Describe the distribution of the variable difference and comment on the following: what was the greatest change? What was the least change? What was a typical amount of change? Did anybody s blood pressure increase? By how much? Did anyone show no change? Support your answers with the graphs and the numbers in the data. We will also do a stem and leaf plot 20
Question 8.-How many people saw their blood pressure fall by 20 or more? 2 Question 9.- If Captopril were ineffective, about where would you expect the center of the distribution of the variable difference to be? 21
Teacher Question 1: Before is a quantitative variable, after is a quantitative variable. Question 2: Min. 1st Qu. Median Mean 3rd Qu. Max. 98.0 103.0 112.0 112.3 121.5 130.0 sd=10.47219 IQR=18.5 According to the histogram and summary statistics, before Captopril, blood pressure ranged from 98 to 130, with 50% of the people having between 103 and 121.5, and 25% having more than 121.5. It looks like we have a bimodal distribution with some people around the 100-105 range and others around the 120-125 range. However, if you notice, there are too many bins for the number of cases. So the bimodality is just an artifact of so many bins. So those are really the typical values. Judging by the standard deviation, the spread is quite large. There are no outliers. The mean and median seem to be very close so we could just use mean and standard deviation as measures of spread 22
Min. 1st Qu. Median Mean 3rd Qu. Max. 82.0 98.0 103.0 103.1 108.0 125.0 sd= 12.5554 IQR=10 According to the histogram and summary statistics, before Captopril, most of the blood pressures ranged from 82 to 110, with 50% of the people having between 103 and 108, and 25% having more than 108. Nobody has between 110 and 120, so the 120-125 group accounbts for that upper 25%. to be kind of different from the rest of the group. So those are really the typical values. Judging by the standard deviation, the spread is quite large. With bimodality, the box plot is not very helpful, but it confirms the range and interquartile range seen in the histogram and the summary statistics. 23
The mean is really close to the median, and the shape of the histogram does not help conclude skweness one way or the other, so the mean and standard deviation are as good here as the median and interquartile range. Question 4 before after Min. : 98.0 Min. : 82.0 1st Qu.:103.0 1st Qu.: 98.0 Median :112.0 Median :103.0 Mean :112.3 Mean :103.1 3rd Qu.:121.5 3rd Qu.:108.0 Max. :130.0 Max. :125.0 Looking at the summary statistics and the box plot we can see that all summary statistics are lower after than before. The median is lower, the 1 st quartile is lower, et The interquartile range is lower after than before, too, suggesting that patients s blood pressure are much more concentrated around the median after than before, a more homogeneous group. The reason we have a higher standard deviation after is that there are two outliers, one in the upper end and another in the lower end. According to all this information, Captopril was effective in decreasing blood pressure. Question 7 The histogram shows that most patients saw their blood pressure decrease. Only 2 had blood pressure that increased. What was the greatest change? A decrease of 23 (-23). What was the least change? A decrease of 1 (-1) What was a typical amount of change? A decrease of 15-20 (-15 to 20) or a decrease of 0-5 (0 to 5). Did anybody s blood pressure increase? Yes. By how much? 4 and 2 Did anyone show no change? No, there are no 0. 24
> stem(change) The decimal point is 1 digit(s) to the right of the -2 31-1 98621-0 854431 0 24 Students can see all these things by creating another column in the data table which they can find by subtracting after from before and by looking at the histogram of the variable difference. Note: difference refers to the the change in the blood pressure (BP after-bp before). 25
10.-The media NATIONAL School attendance can lower risk of HIV, study shows Johannesburg, South Africa 17 January 2008 12:52 26
27
Activity for learners 1.- Describe how the research study described in this news article was done. Could this kind of study have been done through an experiment (a clinical trial style study). 2.- Describe the conclusions of the study. 3.- Identify things in a person that may lead the person to be cautious about AIDS 4.- Rewrite the conclusion of this article based on what you know about how statistical studies should be done to establish causality. 28
11.-Bullying Example 4.3, Teachers Notes FET Phase By Delia North Suppose that learners at various schools are interviewed to establish whether they feel that the school has taken necessary steps to protect them against bullying from older children. Children from rural and urban schools are interviewed. Suppose that 40 learners at rural schools felt that the school took steps to protect them against bullying from older children, while 51 learners at rural schools did not think that they were adequately protected against bullying by older children. In urban schools, 64 learners felt that the school protected them against bullying, whilst 34 did not think so. (a) Set up a 2 x 2 contingency table to reflect the frequencies as given above. (b) Set up a 2 x2 contingency table with all probabilities in the appropriate cells, so as to answer (i) What is the probability that a randomly chosen learner is from a rural school? (ii) What is the probability that a randomly chosen learner does not think his school takes adequate steps to protect him against bullying? (iii) What is the probability that a randomly chosen learner is from a rural school and feels that the school does take steps to protect him against bullying from older children? (iv) What is the probability that a randomly chosen learner does not feel that the school takes adequate steps to protect him against bullying, given he is from a rural school. 29