Lecture 13. Outliers

Similar documents
4.3 Measures of Variation

People have used random sampling for a long time

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

Understandable Statistics

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Chapter 1: Exploring Data

Instructions and Checklist

Business Statistics Probability

International Statistical Literacy Competition of the ISLP Training package 3

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

CH.2 LIGHT AS A WAVE

Chapter 3 CORRELATION AND REGRESSION

Test 1 Version A STAT 3090 Spring 2018

VU Biostatistics and Experimental Design PLA.216

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

SAMPLE ASSESSMENT TASKS MATHEMATICS ESSENTIAL GENERAL YEAR 11

STAT 608 Guided Exercise 1

AP Stats Review for Midterm

Section I: Multiple Choice Select the best answer for each question.

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Observational studies; descriptive statistics

Lesson 1: Distributions and Their Shapes

Empirical Rule ( rule) applies ONLY to Normal Distribution (modeled by so called bell curve)

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

MA 250 Probability and Statistics. Nazar Khan PUCIT Lecture 7

Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007

V. Gathering and Exploring Data

HS Exam 1 -- March 9, 2006

Averages and Variation

Making Inferences from Experiments

Statistics Coursework Free Sample. Statistics Coursework

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Public Health Wales Vaccine Preventable Disease Programme

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf Key: 2 0 = 20 cm. 6 a Stem Leaf. c Stem Leaf

Lesson 1: Distributions and Their Shapes

Lesson 2: Describing the Center of a Distribution

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Lesson 2.7: Diagnosing Elisa

LOTS of NEW stuff right away 2. The book has calculator commands 3. About 90% of technology by week 5

Herbal Medicine Making Course Module 1 Experiential Projects Checklist

Genetics Unit Outcomes

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Number of grams of fat x 9 calories per gram = Total number of calories from fat

Welcome to OSA Training Statistics Part II

Statistical inference provides methods for drawing conclusions about a population from sample data.

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

JSM Survey Research Methods Section

Undertaking statistical analysis of

I will investigate the difference between male athlete and female athlete BMI, for athletes who belong to the Australian Institute of Sport.

Introduction to Statistical Data Analysis I

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Number of grams of fat x 9 calories per gram = Total number of calories from fat

Statistical Methods Exam I Review

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

HW 1 - Bus Stat. Student:

to weight and number of minutes of physical activity undertaken per week.

Chapter 1 Where Do Data Come From?

Simple Linear Regression the model, estimation and testing

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Name: Date: Period No: 7 th Grade Science 2012/ Semester I Exam Review

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

How could this be fixed? Observation Experiments in Static Equilibrium

Example of data table: Student Number Red Card Rating / A Black Card Rating / B. Example of provided rating paper: Student Number: 03 A: /10 B: /10

MIDTERM EXAM. Total 150. Problem Points Grade. STAT 541 Introduction to Biostatistics. Name. Spring 2008 Ismor Fischer

Statistical Inference

Design and analysis of biodiversity experiments

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Water Microbiology Proficiency Test Scheme. Overview & Description

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Unit 1 Exploring and Understanding Data

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

New Procedures for Identifying High Rates of Pesticide Use

Students were asked to report how far (in miles) they each live from school. The following distances were recorded. 1 Zane Jackson 0.

Comparing Proportions between Two Independent Populations. John McGready Johns Hopkins University

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics

Math 2200 First Mid-Term Exam September 22, 2010

CHAPTER 3 Describing Relationships

Descriptive statistics

Frequency distributions

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

BIOSTATS 540 Fall 2017 Exam 1 Page 1 of 12

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Insight Assessment Measuring Thinking Worldwide

STP226 Brief Class Notes Instructor: Ela Jackiewicz

Still important ideas

Contents. Introduction x Acknowledgements

Volume conversions revision

STEM Stage 3 Science and Technology, Mathematics Greenhouse Evidence of work

Key Questions. What are some of the difficulties a cell faces as it increases in size? How do asexual and sexual reproduction compare?

Chapter 12. The One- Sample

Treatment for Bipolar Disorder

STATISTICS INFORMED DECISIONS USING DATA

Transcription:

Lecture 13 Outliers

Outliers In this lesson: 1. Finding quartiles in a stem and leaf diagram: 2. One definition of an outlier 3. How to classify an observation as an outlier. What you should be able to do: 1. Find the quartiles and the interquartile range of a stem and leaf diagram. 2. Use the method in this lecture to classify observations in a stem and leaf diagram as outliers or not outliers.

Finding Quartiles in a stem and leaf diagram For the following stem and leaf diagram, find Q1, Q3, and the IQR. Then create your own steps for how the problem should be done. Do this problem in your groups. Step 1: Step 2: Step 3: Step 4: Step 5: HINT: How did you find the quartiles in a frequency table of ungrouped data? How do you find the quartiles of a list?

Finding Quartiles in a stem and leaf diagram For the following stem and leaf diagram, find Q1, Q3, and the IQR. Then create your own steps for how the problem should be done. Do this problem in your groups. Q1= 3.2 Step 1: Step 2: Step 3: Step 4: Step 5: Q3 = 4.0 IQR = 0.8 HINT: How did you find the quartiles in a frequency table of ungrouped data? How do you find the quartiles of a list?

Finding Quartiles in a stem and leaf diagram For the following stem and leaf diagram, find Q1, Q3, and the IQR. Then create your own steps for how the problem should be done. Do this problem in your groups. Q1= 3.2 Step 1: Step 2: Step 3: Step 4: Step 5: 30 4 = 7.5 8 Q 1 = x 8 = eigth term = 3. 2 n = 30 3 30 4 Q3 = 4.0 IQR = 0.8 = 22.5 23 Q 3 = x 23 = twenty third term = 4. 0 HINT: How did you find the quartiles in a frequency table of ungrouped data? How do you find the quartiles of a list? IQR = Q 3 Q 1 = 4.0 3.2 = 0. 8

One Definition of an Outlier Earlier we defined an outlier as an observation that is very far away from other observations. There is, however, a small problem with this definition:

One Definition of an Outlier Earlier we defined an outlier as an observation that is very far away from other observations. There is, however, a small problem with this definition: WHAT DOES THIS MEAN?

One Definition of an Outlier Earlier we defined an outlier as an observation that is very far away from other observations. There is, however, a small problem with this definition: WHAT DOES THIS MEAN? How do we decide what is very far and what is not? Is it just a personal decision? How do different statisticians agree on what is and is not an outlier if they have different ideas on what is very far? To solve this problem there is one definition we can use:

One Definition of an Outlier Earlier we defined an outlier as an observation that is very far away from other observations. There is, however, a small problem with this definition: WHAT DOES THIS MEAN? How do we decide what is very far and what is not? Is it just a personal decision? How do different statisticians agree on what is and is not an outlier if they have different ideas on what is very far? To solve this problem there is one definition we can use: Definition Outlier An outlier is any observation that is 1.5(IQR) away from the first or third quartile Formula: Upper outlier > Q3 +1.5(IQR) Lower outlier < Q1 1.5(IQR)

One Definition of an Outlier Earlier we defined an outlier as an observation that is very far away from other observations. There is, however, a small problem with this definition: WHAT DOES THIS MEAN? How do we decide what is very far and what is not? Is it just a personal decision? How do different statisticians agree on what is and is not an outlier if they have different ideas on what is very far? To solve this problem there is one definition we can use: Definition Outlier An outlier is any observation that is 1.5(IQR) away from the first or third quartile Formula: Upper outlier > Q3 +1.5(IQR) Lower outlier < Q1 1.5(IQR) REMEMBER: This is a definition of an outlier, not the definition of an outlier.

How to classify an observation as an outlier For the following stem and leaf diagram, find all of the outliers Q1= 3.2 Q3 = 4.0 IQR = 0.8 Step 1: Find Q1 and Q3 Step 2: Find the IQR Step 3: Calculate Q1 1.5(IQR), if any observations are smaller than that number, they are outliers Step 4: Calculate Q3 + 1.5(IQR), if any observations are larger than that number, they are outliers

How to classify an observation as an outlier For the following stem and leaf diagram, find all of the outliers Q1= 3.2 Q3 = 4.0 IQR = 0.8 Step 1: Find Q1 and Q3 Step 2: Find the IQR Step 3: Calculate Q1 1.5(IQR), if any observations are smaller than that number, they are outliers Step 4: Calculate Q3 + 1.5(IQR), if any observations are larger than that number, they are outliers Q 1 1.5 IQR = 3.2 1.5 0.8 = 2 Q 3 + 1.5 IQR = 4.0 + 1.5 0.8 = 5.2 There are no observations < 2 but 5.5 >5.2, 5.5 is an outlier