STAB22 Statistics I. Lecture 12

STAB22 Statistics I Lecture 12 1

Midterm Grades Frequency 0 20 40 60 0 10 20 30 40 Marks ( / 40 ) Min Q1 Median Q3 Max Marks / 40 17.5 65 80 90 100 Marks / 100 7 26 32 36 40 2

Example (Sample Survey) The HR department of a large firm wants to know the average job satisfaction rate of its employees; they email online questionnaire to 100 randomly selected employees. Find: Population Parameter Sample Statistic Sampling frame Sampling method Possible sources of bias 3

Example (Sample Survey) Identify the sampling method & possible sources of bias for the following cases: A course evaluation form is given out to students at the last class of each course Traffic patrol officer stop every 50 th car crossing intersection to check license & registration For new movie, 20 theaters were chosen at random, and 5 males & 5 female random viewers within each theater where interviewed 4

Collecting Data Three basic data collection methods: Sample Surveys Observational studies Observe individuals & measure variables of interest, without any control on their response Experiments Impose different treatments on individuals in order to measure & compare their effect 5

Observational Studies Want to find how consumers like brand XYZ Could perform sample survey: go and ask a sample of consumers directly Typically high cost in time & $ Instead, could look at sales/customer records from retailers (called Observational Study) Don t ask individuals, just observe them Typically cheaper than sample survey However, cannot assign questions/choices 6

Observational Studies Two basic types of observational studies: Retrospective study: pick individuals & extract historical (i.e. past) data on them Pro: doesn t take time, can span longer periods Con: cannot control/correct past data recording Prospective study: pick individuals & collect data as events happen over time Pro: have control of observation process Con: more time consuming 7

Observational Studies Observational studies also useful for finding trends & possible variable relationships E.g. Medical records indicate people who exercise regularly suffer less from insomnia However, observational studies do not demonstrate a causal relationship E.g. It is not necessarily true that exercise reduces insomnia (they are just associated) In order to demonstrate causal relationship need to perform experiment 8

Experiments How do we establish cause and effect? Randomly select some subjects & instruct them to exercise, and remaining subjects not to exercise; then asses & compare insomnia for both groups How does this help? Choosing two groups at random means they start out relatively equal in terms of any characteristic that might matter for insomnia If groups end up with unequal insomnia, this is proof that exercise made a difference 9

Experiment Terminology Experimental units / subjects: individuals participating in experiment Factor: explanatory variable whose level can be manipulated by experimenter Levels: Specific values chosen for factor Treatment: Specific combination of manipulated levels of one or more factors Response: variable whose values are compared across treatments Statistically significant: a factor effect so large 10 that it would rarely occur by chance

Example Look at Insomnia vs Exercise & Diet Exercise: none, moderate, strenuous Diet: Veg. (vegeterian), Non-veg. Identify: Factors Levels # of treatments How to assign subjects? 11

Principles of Experimental Design 1. Control: control sources of variation, besides factors, by making conditions as similar as possible for all treatment groups 2. Randomize: helps equalize effects of unknown/uncontrollable sources of variation Note: Randomization does not eliminate the effects of these sources, but tries to spread them out across the treatment levels so that we can see past them 12

Principles of Experimental Design 3. Replicate: get several measurements of response for each treatment 4. Blocking: for variables we can identify but cannot control and which affect response, divide subjects into groups of same variable values (a.k.a. blocks) and randomize within each block Removes much of the variability due to the difference among the blocks. 13

Experimental Designs Completely Randomized Design (CRD): All experimental units are allocated at random among all treatments Randomized Block Design (RBD): Random assignment of units to treatments is carried out separately within each block. 14

Blocking Assume 36 adult & 6 child participants in previous insomnia experiment If you just randomize, you could get all children in one treatment How would you block age? 15

Experiments Placebo: fake treatment designed to look like a real one, used when just knowledge of receiving any treatment can affect response E.g. Used to test effectiveness of pain medication For comparing results, often use current standard treatment as baseline Subjects getting placebo/standard treatment called control group 16

Blinding Knowledge of assigned treatment can often influence the assessment of the response Two classes of individuals can affect experiment Those who influence results (e.g. subjects, nurses) Those who evaluate results (e.g. physicians, judges) Blinding avoids bias from knowing treatment Single-blind: every individual in either one of the two classes doesn t know treatments Double-blind: every individual in both of the two classes doesn t know treatments 17

Experiments Golden standard for experiments: randomized comparative double-blind placebo-controlled. Even so, could have confounding problems Confounding variable: variable associated with both factor & response Cannot tell whether effect on response is caused by factor or confounding variable E.g. Subject s weight might be is associated with both insomnia & diet (veg/non-veg) 18