Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Similar documents
Sheila Barron Statistics Outreach Center 2/8/2011

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Part III Taking Chances for Fun and Profit

Applied Statistical Analysis EDUC 6050 Week 4

Risk Aversion in Games of Chance

Psychology Research Process

Probability Models for Sampling

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

Statistical Methods Exam I Review

Welcome to OSA Training Statistics Part II

Statistics for Psychology

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Gage R&R. Variation. Allow us to explain with a simple diagram.

Descriptive Statistics Lecture

Psychology Research Process

Chapter 7: Descriptive Statistics

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Module 28 - Estimating a Population Mean (1 of 3)

CCM6+7+ Unit 12 Data Collection and Analysis

People have used random sampling for a long time

Chapter 3: Examining Relationships

Business Statistics Probability

about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

3.2 Least- Squares Regression

Variability. After reading this chapter, you should be able to do the following:

Sexual Feelings. Having sexual feelings is not a choice, but what you do with your feelings is a choice. Let s take a look at this poster.

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation

Still important ideas

Quantitative Literacy: Thinking Between the Lines

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation

Never P alone: The value of estimates and confidence intervals

Estimation. Preliminary: the Normal distribution

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

USING STATCRUNCH TO CONSTRUCT CONFIDENCE INTERVALS and CALCULATE SAMPLE SIZE

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

The normal curve and standardisation. Percentiles, z-scores

THE DIVERSITY OF SAMPLES FROM THE SAME POPULATION

Unit 1 Exploring and Understanding Data

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Brad Pilon & John Barban

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Lesson 9 Presentation and Display of Quantitative Data

Political Science 15, Winter 2014 Final Review

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Eating and Sleeping Habits of Different Countries

12.1 Inference for Linear Regression. Introduction

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Chapter 2--Norms and Basic Statistics for Testing

CHAPTER 3 Describing Relationships

Chapter 1: Exploring Data

Making Inferences from Experiments

ANOVA. Thomas Elliott. January 29, 2013

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

PRINCIPLES OF STATISTICS

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Available as a Powerpoint data file

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

An InTROduCTIOn TO MEASuRInG THInGS LEvELS OF MEASuREMEnT

Physiological Mechanisms of Lucid Dreaming. Stephen LaBerge Sleep Research Center Stanford University

Still important ideas

Chi Square Goodness of Fit

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Why we get hungry: Module 1, Part 1: Full report

Patrick Breheny. January 28

Measuring the User Experience

MATH 1040 Skittles Data Project

STA1001C. Student Course Materials

IAPT: Regression. Regression analyses

CHAPTER ONE CORRELATION

Section 3.2 Least-Squares Regression

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Everything you ever wanted to know about statistics (well, sort of)

Flu Vaccines: Questions and Answers

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Inferential Statistics: An Introduction. What We Will Cover in This Section. General Model. Population. Sample

GENETIC DRIFT & EFFECTIVE POPULATION SIZE

Chapter 12. The One- Sample

Unit 2: Probability and distributions Lecture 3: Normal distribution

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Reliability, validity, and all that jazz

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Lauren DiBiase, MS, CIC Associate Director Public Health Epidemiologist Hospital Epidemiology UNC Hospitals

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Confidence Intervals. Chapter 10

Suppose we tried to figure out the weights of everyone on campus. How could we do this? Weigh everyone. Is this practical? Possible? Accurate?

Anxiety. Top ten fears. Glossophobia fear of speaking in public or of trying to speak

9 research designs likely for PSYC 2100

Chapter 3 CORRELATION AND REGRESSION

Transcription:

Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet

The Basics Let s start with a review of the basics of statistics. Mean: What most people consider average. The sum of all scores divided by the number of scores. The mean is good for the average of normally distributed data. Median: The middle number when data is ordered. If you have an even number, it s the mean of the two middle points. The median is good for the average of data that is not normally distributed. Mode: The most frequently-seen value in the data. 0 if no data points repeat.

Data Distribution Feast your eyes on this data and try to get a rough sense of how a histogram (frequency chart) would look. Where would the peak be? Distribution Chart of Heights of 100 Control Plants Height of plants (cm) # of Plants 0.0-0.9 3 1.0-1.9 10 2.0-2.9 21 3.0-3.9 30 4.0-4.9 20 5.0-5.9 14 6.0-6.9 2

Data Distribution This is a normal distribution, also known as a bell curve. The majority of individuals are medium. 35 30 25 20 15 10 5 0 Number of Plants in Each Class 0.0-0.9 1.0-1.9 2.0-2.9 3.0-3.9 4.0-4.9 5.0-5.9 6.0-6.9

Abnormal Distribution? Human height is a fairly normal distribution. Average U.S. woman (age 20+) is 5 4. Average U.S. man (age 20+) is 5 9.5. About 50% of people are at or above average and 50% are at or below average. What, then, is not a normal distribution? Imagine if most women are 5 4, but no one is taller. That s not a normal distribution, and it won t be a bell curve.

Abnormal Distribution The same goes for test scores. If we get an average of 80% on a test, we don t necessarily have a normal distribution. That s why the median is better than the mean for test scores. Imagine if the average were a 100% definitely not a normal distribution.

Back to Standard Deviation/Error Suppose two students take a test. One gets a 100%, one gets a 0%. What s the mean? 50%. Suppose two students take a test. One gets a 50%, one gets a 50%. What s the mean? 50%. So it s the same mean, but we got there very differently. This could mean a lot about the test. Variance measures the average difference from the mean in a set of data.

Variance Variance is given by the symbol s 2. A high variance is indicative of a lot of deviation from the mean. A low variance is indicative of relatively stable values.

Calculating Variance Σ is sum of you need to perform the numerator operation for each number in the data set. x i is an individual number in your data set. x (read: x bar ) is the mean for your data. n is your sample size. s 2 = S(x i - x) 2 n -1

Sample Samples Let s try calculating the variance: Plant Height (cm) Deviations from mean Squares of deviation from mean (x i ) (x i - x) (x i - x) 2 A 10 2 4 B 7-1 1 C 6-2 4 D 8 0 0 E 9 1 1 Divided by n-1 Mean = 8 Σ (x i - x) 2 = 10 10 / (5-1) = 2.5 _

Whoo, variance! Now what? The standard deviation is simply the square root of the variance. So its symbol is s. s ( x In our example, s 2 (variance) is 2.5, so s (standard deviation) is 1.58. Now, you may be asking why we bother taking this statistic, if variance seems to do the same thing. The reason is that we can make some inferences and statements about the data in the same way we used chisquared tables to make inferences about the role of chance. i - x) n 1 2

Standard Deviation (SD) Inferences If you assume a normal distribution of data, 68.27% of data is within 1 SD of the mean. No real difference. 95.45% of the data is within 2 SD. Anything outside is probably an outlier. 99.73% of the data is within 3 SD. Anything outside is almost definitely an outlier.

Standard Deviation (SD) Inferences Suppose the average height of a population is 6 feet (SD = 0.5 feet). If the population is normally distributed: 68.27% of the population is between 5.5 and 6.5. 95.45% of the population is between 5 and 7. 99.73% of the population is between 4.5 and 7.5.

Standard Deviation The standard deviation (and mean/variance) allow us to learn something about an entire population from just a sample. Assuming a normal distribution. For example, if we took a sample of pro basketball players heights, we could generalize the raw data of our sample to the entire NBA. Key: The more samples we take, and therefore the more means we determine, the closer we ll get to the actual mean of the entire league.

Standard Error The standard error of the means (SEM) (or just plain standard error) is a way to determine how likely our data is off from reality due to chance. Oddly a little like x 2. Example: Consider the NBA player height survey. We could sample 10 players and get the average height, and get the standard deviation from that. If we continued to sample 10 players over and over and over again, however, the mean of our calculated means would start to become more like the true mean. Standard error of the means helps us figure out how close our calculated mean is to the true mean, even without knowing it.

Standard Error Put it another way: If we survey 10 players, that s a low number. Is it likely that those 10 players perfectly represent the league? Probably not. If we survey 300 players, that s a high number. Is it likely that those 300 players perfectly represent the league? Probably.

Standard Error Yet Another Way In hockey, one statistic is SOG (shots on goal). It s the amount of shots a team makes that would have gone in, if there were no goalie. Now, for those of you that don t watch/play hockey, suppose I asked you to determine the average amount of SOG a team gets in a game. Here s the data

Standard Error Yet Another Way SOG per game: Game 1: 22 Game 2: 20 Game 3: 21 Game 4: 21 What s the average? You d say it s probably around 21, right? It may be off, but probably only by a little bit? So you ll have a relatively small standard error because the data are consistent.

Standard Error Yet Another Way What if I gave you these data? SOG per game: Game 1: 41 Game 2: 19 Game 3: 29 Game 4: 56 What s the average? You might say it s in the 20s, but you re probably not as confident. So you ll have a relatively large standard error because the data are not very consistent.

Standard Error The formula for standard error should now make sense: SE X = s s = standard deviation n = sample size The standard error is best when it is closest to 0. n

Standard Error vs. Standard Deviation Key: Standard deviation is the deviation of the raw data from the sample s mean. Think the deviation of an NBA player s height from the average of a surveyed population. Key: Standard error is the deviation of the sample from the actual population s mean. Think the deviation of our surveyed population s mean height from the true mean height of an NBA player from the entire league.

One last way to understand this Remember the potato cores? You can calculate the average potato core mass, but that doesn t tell us how consistent the mass was. That s why we have standard deviation. Once you get a mean for your samples, it also doesn t tell us if your set of potato cores was representative of all the cores I was slicing. That s why we have standard error.

Standard Error vs. Standard Deviation Interpreting data: Generally you want standard deviation low. This means your underlying data set is more consistent. Why is that important? You definitely want standard error low. How can we minimize standard error? Have a low standard deviation ( out of our control). Have a large sample size ( in our control).

Confidence Intervals & Error Bars In addition to the inferences about data from before (68% within one SD, et cetera), we also can make inferences using SEM. These are more important for biology. Traditionally, 95% is the confidence we need in our data (just like in chi-squared analyses). For SEM, 95% confidence is a confidence interval represented on a graph as error bars. Let s take a closer look.

Confidence Intervals & Error Bars Suppose you want to see if Central Bucks HS students are significantly taller than Council Rock HS students. You can t do a x 2 analysis because there s no expected. So, you take the mean of some of the students from each district. You can t measure all of them that d take forever. You get the SD and SEM as shown: Team Mean Standard Deviation Standard Error Council Rock 72 in. 6 in. 1.90 in. Central Bucks 80 in. 4 in. 1.26 in. Let s graph the means.

Height (in) Mean Height of High School Students 86 84 82 80 78 76 74 72 70 68 66 Council Rock Central Bucks District

Confidence Intervals & Error Bars Team Mean Standard Deviation Standard Error Council Rock 72 in. 6 in. 1.90 in. Central Bucks 80 in. 4 in. 1.26 in. Okay, now let s figure out a 95% confidence interval. The 95% confidence interval is traditionally ± 2 SEM about the mean. In this case: C. Rock = 72 in ± 3.80 in (since 1.90 in * 2 = 3.80 in) C. Bucks = 80 in ± 2.52 in (since 1.26 in * 2 = 2.52 in) Now let s draw the intervals on the graph.

Height (in) Mean Height of High School Students 86 84 82 80 78 76 74 72 70 68 66 Council Rock Central Bucks District The shapes are the 95% confidence intervals. Since they don t overlap between the districts, there is probably a significant difference between the heights of the two.

Confidence Interval Frame of Mind When you construct a graph with confidence intervals and find they do overlap, it suggests insignificant (null) results. It s possible that the real average height of ALL Council Rock students is actually equal to the same for the Central Bucks. This is also known as sampling error. In other words, there is some average height, within both confidence intervals, that could make the two teams equal. If there is no overlap, it suggests significance.

Practice Standard Deviation and Standard Error Procedural Practice

Practice How else are we going to practice standard deviation and standard error? With your data! Find in your lab notebooks the measurements you took on potato core size. Calculate the standard deviation and standard error for your data set with your lab group. See why I had you take their masses individually?

Practice Calculate standard deviation: What is the SD of your set of eighteen cores before the study and the SD of your eighteen cores afterward? Calculate the standard error: For each set of data, how likely is our average potato mass was close to the actual average potato mass of all the slices I cut for our lab? No error bars or graph needed. Last Key Note: Your units for SD and SE match the units of the mean (here it s grams).