Math 1680 Class Notes. Chapters: 1, 2, 3, 4, 5, 6

Similar documents
Chapter 1: Exploring Data

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Introduction to Statistics Design of Experiments

AMS 5 EXPERIMENTAL DESIGN

Study Design. Study design. Patrick Breheny. January 23. Patrick Breheny Introduction to Biostatistics (171:161) 1/34

Study Methodology: Tricks and Traps

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Sampling Controlled experiments Summary. Study design. Patrick Breheny. January 22. Patrick Breheny Introduction to Biostatistics (BIOS 4120) 1/34

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Collecting Data Example: Does aspirin prevent heart attacks?

Unit 1 Exploring and Understanding Data

*Karle Laska s Sections: There is NO class Thursday or Friday! Have a great Valentine s Day weekend!

Chapter 7: Descriptive Statistics

Probability Models for Sampling

Introduction; Study design

Section 3.2 Least-Squares Regression

Statistics Coursework Free Sample. Statistics Coursework

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Lesson 9 Presentation and Display of Quantitative Data

Observational studies; descriptive statistics

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

STATISTICS: METHOD TO GET INSIGHT INTO VARIATION IN A POPULATIONS If every unit in the population had the same value,say

Making Inferences from Experiments

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Quantitative Literacy: Thinking Between the Lines

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

CHAPTER 3 Describing Relationships

CCM6+7+ Unit 12 Data Collection and Analysis

Still important ideas

Still important ideas

Lecture 12: Normal Probability Distribution or Normal Curve

Chapter 3: Examining Relationships

Applied Statistical Analysis EDUC 6050 Week 4

Appendix B Statistical Methods

Chapter 1 Where Do Data Come From?

AP Stats Review for Midterm

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Chapter 20: Test Administration and Interpretation

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Statistics for Psychology

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Understandable Statistics

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Reflection Questions for Math 58B

Student Performance Q&A:

CHAPTER ONE CORRELATION

Political Science 15, Winter 2014 Final Review

3.2 Least- Squares Regression

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

STATISTICS INFORMED DECISIONS USING DATA

Types of data and how they can be analysed

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Unit 7 Comparisons and Relationships

Welcome to OSA Training Statistics Part II

Statistical Methods Exam I Review

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

Pre-Test Unit 9: Descriptive Statistics

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Chapter 2--Norms and Basic Statistics for Testing

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

LOTS of NEW stuff right away 2. The book has calculator commands 3. About 90% of technology by week 5

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Variability. After reading this chapter, you should be able to do the following:

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Chapter 02. Basic Research Methodology

Behavioural models. Marcus Bendtsen Department of Computer and Information Science (IDA) Division for Database and Information Techniques (ADIT)

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

The Logic of Causal Order Richard Williams, University of Notre Dame, Last revised February 15, 2015

Never P alone: The value of estimates and confidence intervals

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Measuring the User Experience

Math 124: Module 3 and Module 4

Chapter 15: Continuation of probability rules

Test 1C AP Statistics Name:

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

Examining Relationships Least-squares regression. Sections 2.3

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

IAPT: Regression. Regression analyses

Psych 1Chapter 2 Overview

Introduction & Basics

Dr. Allen Back. Sep. 30, 2016

Review+Practice. May 30, 2012

Unit 3 Lesson 2 Investigation 4

Thursday, April 25, 13. Intervention Studies

What is Statistics? (*) Collection of data Experiments and Observational studies. (*) Summarizing data Descriptive statistics.

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Regression. Lelys Bravo de Guenni. April 24th, 2015

bivariate analysis: The statistical analysis of the relationship between two variables.

AP Stats Chap 27 Inferences for Regression

Chapter 4: More about Relationships between Two-Variables Review Sheet

THE DIVERSITY OF SAMPLES FROM THE SAME POPULATION

Transcription:

Math 1680 Class Notes Chapters: 1, 2, 3, 4, 5, 6

Chapter 1. Controlled Experiments Salk vaccine field trial: a randomized controlled double-blind design 1. Suppose they gave the vaccine to everybody, and the incidence of polio went down. Would that show the vaccine was effective? 2. What if they put the consent group in treatment and the noconsent group in control? (a) Would the difference in group sizes matter? (b) How do the two groups (consent and no-consent) differ? (Polio is a disease of hygiene) 3. What about the NFIP design (grade 2 in treatment, grades 1 and 3 in control)? 4. In a proper controlled experiment, should the assignment be done by the toss of a coin, or by expert judgment? 1

Whenever possible, the control group is given a placebo, which is neutral but resembles the treatment. The response should be to the treatment itself rather than to the idea of treatment. In a double-blind experiment, the subjects do not know whether they are in treatment or in control; neither do those who evaluate the response. This guards against bias, either in the responses or in the evaluations. Conventional wisdom dictates that the investigator should control the key variables and randomize the rest. 2

Chapter 2. Observational Studies In an observational study, the subjects were assigned to treatment through a process outside the control of the investigator. (Studies on the effects of smoking, for instance, are necessarily observational: nobody is going to smoke for ten years just to please a statistician.) Unlike controlled experiments, observational study is harder to draw conclusions about cause-and-effect relationships. The cause and effect may both be the result of some hidden third factor a confounder. A confounder is not just any alternative explanation for an effect. The idea is more subtle: in order for X to confound the association between Y and Z, X has to be associated both with Y and with Z. Note that if X causes Y, we can also say that X is associated with Y. But if X is associated with Y, we cannot say that X causes Y. (A8) 3

In an observational study, a confounding factor can sometimes be controlled for, by comparing smaller groups which are relatively homogeneous with respect to the factor. (Sex Bias in Graduate Admissions on p.17) Relationships between percentages in subgroups (for instance, admissions rates for men and women in each department separately) can be reversed when the subgroups are combined. This is called Simpson s paradox. 4

(Town X) Not an example of Simpson s paradox Democrat Ward A B C All Total number 750 200 150 1100 Number voting 150 160 110 420 Rate 20% 80% 73.3% 38.2% Republican Ward A B C All Total number 700 230 120 1050 Number voting 112 164 78 354 Rate 16% 71.3% 65% 33.7% 5

(Town Y) An example of Simpson s paradox Democrat Ward A B C All Total number 650 250 155 1055 Number voting 140 190 105 435 Rate 21.5% 76% 67.7% 41.2% Republican Ward A B C All Total number 120 730 220 1070 Number voting 20 500 140 660 Rate 16.7% 68.5% 63.6% 61.7% 6

Chapter 3. The Histogram Class Intervals In a histogram, percentages are represented by areas. In this setup, the height of a histogram shows crowding or density: it is percent per unit length. (eg. % per $1000) The point of sketching the histogram is usually to show some qualitative feature, such as the weight in the tails. For this, a smooth curve is just as good as the histogram, and is easier on the eye. 7

Variables: 1. Qualitative 2. Quantitative (a) Discrete (b) Continuous 8

Chapter 4. The Average and the Standard Deviation The average of a list of numbers equals their sum, divided by how many there are. In a cross-sectional study, different subjects are compared to each other at one point in time. In a longitudinal study, subjects are followed over time, and compared with themselves at different points in time.(there is evidence to suggest that, over time, Americans have been getting taller. This is called the secular trend in height, and its effect is confounded with the effect of aging in figure 3. Most of the two-inch drop in height seems to be due to the secular trend: the people age 65-74 were born around 50 years before those age 18-24, and are an inch or two shorter for that reason.) A histogram balances when supported at the average. The median of a histogram is the value with half the area to the left and half to the right. 9

The average is to the right of the median whenever the histogram has a long right-hand tail. When dealing with long-tailed distributions, statisticians might use the median rather than the average, if the average pays too much attention to the extreme tail of the distribution. The SD measures how far away, on the whole, the numbers are from their average. It is the typical departure from average. For many lists of numbers, about 68% of the entries are within one SD of average, and 95% are within two SDs. Although this rule isn t exact or universal, it works surprisingly well for many data sets that don t follow the normal curve at all (footnote 9 to the chapter). The root-mean-square operation: measure the typical size of the numbers in a list SD = r.m.s. deviation from average (E9, E10) 10

Chapter 5. The Normal Approximation for Data This chapter ties together histograms, the average, the SD, and the normal curve. The normal curve was discovered around 1720 by Abraham de Moivre, while he was developing the mathematics of chance. The normal curve has an equation: y = 100% 2π e x2 /2, where e = 2.71828.... 11

The normal curve is shown in the following figure: Normal Curve Percent per S.U. 45 40 35 30 25 20 15 10 5 0-4 -3-2 -1 0 1 2 3 4 Standard Units the area under the normal curve between 1 and +1 is about 68%; the area under the normal curve between 2 and +2 is about 95%; the area under the normal curve between 3 and +3 is about 99.7%. Many histograms for data are similar in shape to the normal curve, provided they are drawn to the same scale. Making the horizontal scales match up involves standard units. 12

A value is converted to standard units by seeing how many SDs it is above or below the average. In figure 2, the shaded area under the histogram between 61 inches and 66 inches represents the percentage of women with heights in that range, which is the interval within 1 SD of the average. By inspection, the shaded area is about equal to the area under the normal curve between 1 and 1. This last area is 68%, justifying the 68% rule. (The two vertical scales match up in the following way: 10% per inch = 1 10% inch = 1 10% 2.5 inch 2.5 = 2.5 25% inches = 25% 1 standard unit = 25% per standard unit) (R5) Normal approximation (examples 8 9 on pp.85-87) 13

For reasons of their own, statisticians call de Moivre s curve normal. This gives the impression that other curves are abnormal. Not so. Many histograms follow the normal curve very well, and many others like the income histogram do not. Later in the book, we will present a mathematical theory which helps explain when histograms should follow the normal curve. Finding percentiles for the normal curve.(p.265, #9) Change of scale: 1. Adding the same number to every entry on a list adds that constant to the average; the SD does not change. 2. Multiplying every entry on a list by the same positive number multiplies the average and the SD by that constant. 3. These changes of scale do not change the standard units. 14

Chapter 6. Measurement Error The SD of a series of repeated measurements gives the likely size of the chance error in each one. individual measurement = exact value + chance error The variability in repeated measurements reflects the variability in the chance errors, and both are gauged by the SD of the data. Mathematically, the SD of the chance errors must equal the SD of the measurements: adding the exact value is just a change of scale. With outliers, many histograms just do not follow the normal curve (figure 2). There is a hard choice to make when investigators see an outlier. Either they ignore it, or they have to concede that their measurements don t follow the normal curve. The prestige of the curve is so high that the first choice is the usual one a triumph of theory over experience. 15

Bias affects all measurements the same way, pushing them in the same direction. Chance errors change from measurement to measurement, sometimes up and sometimes down. The basic equation has to be modified when each measurement is thrown off by bias as well as chance error: individual measurement = exact value + bias + chance error. Usually, bias cannot be detected just by looking at the measurements themselves. Instead, the measurements have to be compared to an external standard or to theoretical predictions. 16