The normal curve and standardisation. Percentiles, z-scores

Similar documents
Business Statistics Probability

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Applied Statistical Analysis EDUC 6050 Week 4

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Unit 1 Exploring and Understanding Data

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Understandable Statistics

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Statistics: Making Sense of the Numbers

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

Chapter 1: Exploring Data

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

On the purpose of testing:

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Statistics for Psychology

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

Introduction to Statistical Data Analysis I

Psychology Research Process

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Part III Taking Chances for Fun and Profit

STATISTICS AND RESEARCH DESIGN

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

People have used random sampling for a long time

Statistical Significance, Effect Size, and Practical Significance Eva Lawrence Guilford College October, 2017

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Study Guide for the Final Exam

Empirical Rule ( rule) applies ONLY to Normal Distribution (modeled by so called bell curve)

STATISTICS & PROBABILITY

Quantitative Literacy: Thinking Between the Lines

Descriptive Statistics Lecture

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

NORTH SOUTH UNIVERSITY TUTORIAL 1

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Homework Exercises for PSYC 3330: Statistics for the Behavioral Sciences

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

CHAPTER OBJECTIVES - STUDENTS SHOULD BE ABLE TO:

PRINCIPLES OF STATISTICS

Undertaking statistical analysis of

Variability. After reading this chapter, you should be able to do the following:

Human-Computer Interaction IS4300. I6 Swing Layout Managers due now

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

4.3 Measures of Variation

Lauren DiBiase, MS, CIC Associate Director Public Health Epidemiologist Hospital Epidemiology UNC Hospitals

Research Analysis MICHAEL BERNSTEIN CS 376

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Missy Wittenzellner Big Brother Big Sister Project

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

(a) 50% of the shows have a rating greater than: impossible to tell

Psychology Research Process

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Appendix B Statistical Methods

Lesson 8 Descriptive Statistics: Measures of Central Tendency and Dispersion

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Methodological skills

An Introduction to Research Statistics

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

V. Gathering and Exploring Data

FORM C Dr. Sanocki, PSY 3204 EXAM 1 NAME

HS Exam 1 -- March 9, 2006

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

2.4.1 STA-O Assessment 2

Political Science 15, Winter 2014 Final Review

Never P alone: The value of estimates and confidence intervals

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Sheila Barron Statistics Outreach Center 2/8/2011

Lesson 9 Presentation and Display of Quantitative Data

Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3 - Numerical Descriptive Measures Solutions

9 research designs likely for PSYC 2100

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Inferential Statistics: An Introduction. What We Will Cover in This Section. General Model. Population. Sample

Averages and Variation

Unit 2: Probability and distributions Lecture 3: Normal distribution

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

The Single-Sample t Test and the Paired-Samples t Test

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

(a) 50% of the shows have a rating greater than: impossible to tell

The Confidence Interval. Finally, we can start making decisions!

Observational studies; descriptive statistics

Transcription:

The normal curve and standardisation Percentiles, z-scores

The normal curve Frequencies (histogram) Characterised by: Central tendency Mean Median Mode uni, bi, multi Positively skewed, negatively skewed Variability (spread, dispersion) Range Interquartile range Standard deviation and Variance

Normal curves? Blue, pink and green marks?

Another key term Outlier

The Normal Distribution Described by: Shape: unimodal Central Tendency: mean = median = mode Variability?? 5

The normal curve and variability

Several natural distributions are described by the normal curve (if the sample is large enough) If the data are normally distributed, we can standardise the scores and compare different data sets. Lucy is 169 cm tall. Lucy has 673 Facebook friends. Is she as popular as she is tall?

We need a measure for which we know the percentiles (what percentage of the data are below the score) The distance from the mean in Standard Deviations (for a normal curve) Standard score (z-score): the distance below or above the mean in Standard Deviations

Probability Distribution

Boxplot and the normal distribution

Standard Scores (Z-scores) A way to put scores on a common scale Computed by subtracting the mean from the score and dividing by the standard deviation Interpreting the Z-score Positive Z-scores are above the mean; negative Z- scores are below the mean The larger the absolute value of the Z-score, the further the score is from the mean A z-score of 0 means that the value =?

Z-Scores: The Standard Deviation Meter Use Z-scores to express values regardless of the original unit of measure Once you have the standard deviation, you can go from raw scores to z-scores, and from z-scores to raw scores. z X ( X z SD SD M ) X M 2 N 1 SD M

How To Calculate Z-scores z ( ) X Wolf pups in den X X-M (X-M)^2 Z-score 5 3 6 9 M = SS = s 2 = SD =

How To Calculate Z-scores z ( X M SD ) Pups in den X X-M (X-M)^2 Z-score 5-0.75 0.56-0.3 3-2.75 7.56-1.1 6 0.25 0.06 0.1 9 3.25 10.56 1.3 M = 5.75 SS = 18.75 s 2 = 6.25 SD = 2.5

exercise Take your data set (height, Facebook friends, etc.) Calculate the z-scores SPSS: Analze -> Desriptive Statistics -> Descriptives -> Save standardized values as variables Take an individual. Are they as tall as they are popular? Do they drink as much as they sleep?

Z-Scores & Percentiles http://www.mathsisfun.com/data/standard-normal-distributiontable.html Each z-score is associated with a percentile. Z-scores tell us the percentile of a particular score Can tell us % of population above or below a score, and the % of population between the score and the mean and the tail.

Example: Fitness Jessica can run round Margit sziget in 45 minutes The mean for women is 38 minutes The standard deviation for women is 6.35 z ( X M SD ) 45 38 6.35 1.1 According to z-score table, the percentile associated with z = 1.1 is 36.4%

Running Example: This tells us that 36.4% of the population is between the mean and Jessica s score. There is a 36.4% chance of Jess being slower than the average by this amount BY CHANCE ALONE 50+36.4 = 86.4% of women are faster than Jessica, and there is a 100-86.4% = 13.6% chance of someone being this slow by CHANCE ALONE. (The nearest tail is the positive tail (100 th percentile), so you take the difference between 100 and 86.4)

Running example: exercise 1 Cecilia runs round the island in 33 minutes. What is the probability that someone is that slow by chance alone? (= What percentage of scores are between the Cecilia s score and the nearest tail of the distribution?) (The mean for women is 38, The standard deviation for women is 6.35) ESTIMATE THE ANSWER FIRST, THEN DO THE CALCULATIONS

Running example: exercise 2 A person dressed in unisex winter clothes is known to run round the island in 22 minutes. What is the probability that this person is a member of the population of women? (The mean for women is 38 minutes, The standard deviation for women is 6.35) ESTIMATE THE ANSWER FIRST, THEN DO THE CALCULATIONS

Running example: Exercise 3 What s the probability of someone being as far from the mean in either direction as Jessica? Cecilia? The unidentified stranger?

hypothesis testing the z test

Inferential statistics Your hypothesis predicts a difference between means If you look at the means, you can tell whether there is a difference or not but you cannot tell whether this difference is due to chance because there is chance variation across people every time you test the SAME people under the SAME circumstances, you ll have slightly different results by chance Inferential statistics will tell you how likely it is that the difference you observe is due to chance In other words: how likely it is that one mean comes from the same population than the other mean if the difference is probably not due to chance, you have statistically significant results

What do we mean by probably Statistical test tells you how likely it is that your observed means come from the same population Highest probability: 100% (p = 1) meaning: there is a 100% chance that the difference between the means is due to chance you definitely cannot reject the null hypothesis Low probability: 0.00000000001% (p =.0000000001) meaning: there is an extremely low chance that the difference between the means is due to chance you can definitely reject the null hypothesis

What do we mean by probably We have to decide on a cut-off point (critical value, alpha) Depends on how much risk of Type I error you are willing to take High stakes research: alpha =.001 you take your results to be statistically significant if p <.001 this means that there is a 1% chance that the difference between the means is due to chance (Type I error) Low stakes research: alpha =.05 you take you results to be statistically significant if p <.05 this means that there is chance that the difference between the means is due to chance. (Type I error)

One-tailed or two-tailed A one-tailed hypothesis: when the difference is predicted to be in a certain direction Two-tailed hypothesis: when the difference is predicted to be in either direction If your cut-off point is.05 for a one-tailed hypothesis, what is it for a two-tailed hypothesis?

Statistical Significance A finding is statistically significant if the data differ from what we would expect from chance alone, if there were, in fact, no actual difference. They may not be significant in the sense of big, important differences, but they occurred with a probability below the critical cutoff value. Choice of statistical test to use depends on the distribution of the data and on the research design

The simplest test: z test

The Central Limit Theorem The central limit theorem states that IF you take: a. several b. random samples c. from ANY SHAPED population THEN the distribution of sample means will become approximately normally distributed a) becoming more accurate the larger the size of each sample b) with mean M and standard deviation SD / N William Sealy Gosset William Gosset (Student) and Guiness: how to select the best ingredients

Why By using means rather than individual scores, we eliminate extreme scores

Distribution of Means and Sample Size As the sample size of each sample in the distribution of means increases, the normal curve becomes narrower and taller: smaller spread For a distribution of means, the standard deviation from the mean is the Standard Error (of the Mean)

Normal Distribution vs. Distribution of Means Normal distribution Standard Deviation & z- scores SD X M 2 N 1 z ( X SD M ) Distribution of means Standard Error SE = SD N

symbols Distribution Symbol for Mean Symbol for Spread Name of Spread Scores μ σ (SD) Standard Deviation Means μ M σ M (SE) Standard Error

Compare the two distributions Facebook friends 1. Distribution of individual scores for whole class 1. take everyone s data 2. draw the histogram using the individual scores 2. Distribution of means for individual students 1. take only your group s data 2. calculate the mean 3. take everyone s means 4. draw the histogram using the means

Real life We rarely test a single individual We are more likely to compare means to test whether they could come from the same population But that s more complicated because the test population also has some variance To test whether a sample comes from a population, you can use the z-test

The z-test 1. Take the mean and size of your sample 2. Take the mean and standard deviation of the comparison population 3. Calculate the Standard Error: divide the population SD by the square root of the sample N 4. Calculate the z-value: (the sample mean the population mean)/standard Error 5. Check the z-value against the Standard Normal Distribution to find out the probability of the sample coming from the comparison population

Homework Marie Claire data for the last time, I promise 1. Work with the entire data set (do not separate men and women) 1. Get SPSS to calculate z-scores for both dependent variables 2. Choose any 4 individuals and for each individual compare his/her z-score in distance with his/her z-score in time. Are the two z-scores similar for these 4 individuals? 3. Why or why not? 2. We want to know whether this sample of men and women come from the population of anti-consumerists. We know that the population of anti-consumerists spend, on average, 9 minutes shopping with a standard deviation of 3 minutes. 1. Calculate the z-score for our sample. 2. Look up the associated percentages. 3. Make a decision: are you confident that our sample come from the population of anti-consumerists? 4. Why or why not?