Motivation: Why analyze data? Introduction, Evidence, and Sampling Clinical trials/drug development: compare existing treatments with new methods Agriculture: enhance crop yields, improve pest resistance Ecology: study how ecosystems develop/ to environmental impacts Lab studies: learn more about biological tissue/cellular activity Statistics is the science of collecting, summarizing, analyzing, and interpreting data. Our goal is to understand the underlying biological phenomena that generate the data. When summarizing data, the life scientist must be able to identify patterns. This is not as easy as it sounds, since no matter how well we control experimental conditions, there will always be a certain amount of variability. Example 1.1.1 In 1881, Louis Pasteur innoculated 48 sheep with anthrax after vaccinating 24 of them. Results are summarized in table 1.1.1. Describe the variability in survival outcomes. Example 1.1.2 A group of mice with a naturally high occurrence of liver tumors were randomly allocated to either be exposed to Escherichia Coli (E. Coli) or live in a germ free environment. The data are summarized in Table 1.1.2. Describe the variability in outcomes. What can you infer?
We will learn different statistical methods for quantifying results later in the semester, but first we will recognize that we must pay attention to the way the data were collected. Example 1.1.5 An entomologist studying whether Sitona larvae would preferentially choose nodulated alfalfa roots or those where nodulation was suppressed. After 24 hours, the number of larvae out of the 120 released into the alfalfa, who made a clear choice between different types was recorded. Results are summarized in Table 1.1.5. What questions might you have when presented with these results? How would each of the following layouts affect the results of the experiment? Is one better than the other? (checkerboard = nodulated; dots = not nodulated) Chapter 1 Page 2
Example 1.1.4 The enzyme, monoamine oxidase (MAO), is thought to play a role in behavior. MAO activity was measured in the blood platelets of 42 schizophrenic patients. The results are summarized in Table 1.1.4 and Figure 1.1.2. Which is preferable: The tabular display or the graphical display? Based upon these summaries, how would you describe the relationship between MAO activity and schizophrenia diagnosis? Chapter 1 Page 3
Example 1.1.6 Fat free body mass was determined for each of 7 men using underwater weighing techniques. 24 hours energy expenditure was measured under sedentary conditions twice or each subject. Results are summarized in Table 1.1.6 and Figure 1.1.4. What is the goal of this analysis? Name some important factors to consider when conducting this analysis. What are the sources of variability? Chapter 1 Page 4
Types of Evidence Goal: We want to gather evidence (collect data) that leads to results that are believable repeatable. We will talk throughout the semester on how to accomplish these goals. So, how do we gather this evidence? Two main types of data collection: Observational Study: Experiment: Example 1.2.2 Is there a genetic basis for sexual orientation? Researchers measured the mid sagittal area of the anterior commissure (AC) of the brain in 30 homosexual males, heterosexual males, and heterosexual females. The data are summarized in Table 1.2.1 and Figure 1.2.1. Which type of study is this? What conclusions can we draw about size of AC and sexual orientation? Chapter 1 Page 5
Example 1.2.4 Before being tested on humans, toxicity of a certain drug was tested on 8 female and 8 male dogs. Females were randomly allocated into two groups of 4 dogs one taking 8mg/kg and one taking 25 mg/kg of the drug; similarly for the male dogs. The alkaline phosphate level was measured after receiving the drug. Results are summarized in Table 1.2.2 and Figure 1.2.2. What type of study is this? What variables are included in this study? Describe the purpose of including each variable. What conclusions appear based upon these summaries? Chapter 1 Page 6
Other topics and terms Blinding An experiment is said to be single blind (double blind) if at least one of (both) A variable in a study is a variable being explained by one or more variables. A variable is one that has an effect on the response but is not accounted for as an explanatory variable in the study. Control Groups Usually in the life sciences we compare two or more groups to one another. Often, one of the groups is a control group. A control group is used as Frequently, the control group is the group receiving a placebo. A placebo is a treatment which has no drug/therapy associated with it. If the placebo is a pill, it has only inert ingredients in it. When patients receiving a placebo respond favorably, this is said to be the There are many controversies surrounding the need for control groups (and placebos). Why do we need them? Are there issues of ethics? Historical controls Chapter 1 Page 7
Random Sampling A population is the larger group of individuals/subject/organisms for which we wish to draw a conclusion. It is almost never feasible in realistic settings to observe the entire population, so we choose a representative subset of the population, called the sample. The size of the sample is almost always denoted by n. Simple Random Sample Definition: A simple random sample of n items is a data set where (a) every population element has an equal chance of selection, and (b) every population element is chosen independently of every other element. This draws upon the larger concept of randomization: selection of data that avoids sources of possible bias. The simple random sample is the same idea as drawing names out of a hat all names are in the hat and a few are drawn out. The simple random sample is at the heart of every sampling scheme, but is too simple to be a one size fits all solution to sampling. Two other common sampling schemes are: Cluster Sampling Divide the population (sampling frame) into clusters usually groups formed by geographical location or another criterion of clustering that saves resources; sample whole clusters. Stratified Random Sampling First, divide the population (sampling frame) into strata groups of similar individuals. Then sample a specified number of individuals from each strata. Chapter 1 Page 8