Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Similar documents
Chapter 1: Exploring Data

V. Gathering and Exploring Data

CHAPTER 3 Describing Relationships

2.4.1 STA-O Assessment 2

Chapter 1. Picturing Distributions with Graphs

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

Unit 1 Exploring and Understanding Data

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Introduction to Statistical Data Analysis I

Understandable Statistics

4.3 Measures of Variation

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Chapter 1: Exploring Data

Test 1C AP Statistics Name:

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

STP226 Brief Class Notes Instructor: Ela Jackiewicz

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Probability and Statistics. Chapter 1

Name AP Statistics UNIT 1 Summer Work Section II: Notes Analyzing Categorical Data

Undertaking statistical analysis of

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

AP Statistics. Semester One Review Part 1 Chapters 1-5

How to interpret scientific & statistical graphs

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Section I: Multiple Choice Select the best answer for each question.

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Statistical Methods Exam I Review

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

AP Stats Review for Midterm

Frequency distributions

Methodological skills

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3 - Numerical Descriptive Measures Solutions

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Statistics is a broad mathematical discipline dealing with

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Business Statistics Probability

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

LOTS of NEW stuff right away 2. The book has calculator commands 3. About 90% of technology by week 5

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Math 2200 First Mid-Term Exam September 22, 2010

The normal curve and standardisation. Percentiles, z-scores

Descriptive Statistics Lecture

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Observational studies; descriptive statistics

3: Summary Statistics

Student Performance Q&A:

People have used random sampling for a long time

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Chapter 1: Introduction to Statistics

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Instructions and Checklist

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

VU Biostatistics and Experimental Design PLA.216

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Still important ideas

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf Key: 2 0 = 20 cm. 6 a Stem Leaf. c Stem Leaf

Averages and Variation

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

STAT243 LS: Intro to Probability and Statistics Quiz 1, Feb 10, 2017 KEY

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP STATISTICS 2010 SCORING GUIDELINES (Form B)

Chapter 3: Examining Relationships

Chapter 1 Where Do Data Come From?

Chapter 3: Describing Relationships

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Empirical Rule ( rule) applies ONLY to Normal Distribution (modeled by so called bell curve)

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Chapter 1: Explaining Behavior

Knowledge discovery tools 381

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Test 1 Version A STAT 3090 Spring 2018

Still important ideas

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

HW 1 - Bus Stat. Student:

A) I only B) II only C) III only D) II and III only E) I, II, and III

(a) 50% of the shows have a rating greater than: impossible to tell

First Hourly Quiz. SW 430: Research Methods in Social Work I

BIOSTATS 540 Fall 2017 Exam 1 Page 1 of 12

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Lesson 1: Distributions and Their Shapes

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Transcription:

Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one of several groups or categories. Quantitative Variable takes numerical values for which it makes sense to find an average. Distribution: Population Sample Collect data from a representative Sample... Make an Inference about the Population. Perform Data Analysis, keeping probability in mind 1

Section 1.1 Analyzing Categorical Data Counts v. Percents Bar Graphs Pie Charts Scales Pictographs Class/Bins, Labels & Consistent Units Two-Way Tables:

Example on page 1 Response Percent Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 46 86 71 A 50-50 chance 696 70 1416 A good chance 663 758 141 Almost certain 486 597 1083 Total 367 459 486 Almost no chance Some chance A 50-50 chance A good chance Almost certain Write a short description of your graph: Side-by-side bar graph Segmented/stacked bar graph 100% 80% Percent 60% 40% 0% 0% Chance of being wealthy by age 30 Almost certain Good chance 50-50 chance Some chance Almost no chance Males Females 3 Opinion

Marginal Distribution: Round Off Error: Conditional Distributions: Response Male Almost no chance Some chance A 50-50 chance A good chance Almost certain Notes: When Describing Categorical Data: Percents are often more useful than counts No single graph can show all the data from a two-way table easily Caution! Even a strong association between two categorical variables can be influenced by other variables lurking in the background. 4

Section 1. Displaying Quantitative Data with Graphs Dotplots: Describing the Pattern of a Distribution 1) In any graph, look for the overall pattern and for striking departures from that pattern. ) Describe the overall pattern of a distribution by its: Shape Center Spread Shape: Approximately Symmetric Skewed Right Skewed Left 5

Modality Unimodal Bimodal Describe the shape of the dotplot on page 5 of your notes: Center: Spread: Unique Characteristics: Outliers: Deviations from Pattern 6

*For more details see the examples starting on page 6 of your textbook. Stem & Leaf Plot (Stemplot) Regular Stemplot:, 8, 0, 1, 1,, 3, 33, 34, 35, 36, 49, 49, 50 Generally when making a stemplot you want to have at least 4 stems. If there are too few stems it is difficult to see the shape; but the same is also true for too many stems! Split Stems: 19, 0, 1,,, 4, 5, 5, 6, 8, 30, 30, 33, 39 Stemplot Key Back-to-back Stemplots: A: 11, 1, 0, 34 B: 11, 1, 13, 3, 8, 9, 36, 37, 38 7

Histogram How to make a histogram: 1) Divide the range of data into classes of equal width. ) Find the count (frequency) or percent (relative frequency) of individuals in each class. 3) Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals. Calculator: Using frequency list to plot histograms when there are many repeated numbers: Class Frequency Relative Frequency Cumulative Frequency Graphs & Charts Cautions! 1) Don t confuse histograms and bar graphs. ) Don t use counts (in a frequency table) or percents (in a relative frequency table) as data. 3) Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. 4) Just because a graph looks nice, it s not necessarily a meaningful display of data. 8

Section 1.3 Describing Quantitative Data with Numbers Measures of Center: Mean v. Median Mean: xi x = n = x1 + x +... + xn n Non-Resistant Measure Median (M) Method of Calculation: If odd number of numbers If even number of numbers Resistant Measure What does comparing the Mean and Median tell us? Excerpt from: America's middle class: Poorer than you think By Tami Luhby (CNNMoney) Rich Americans. That's our global reputation. The numbers seem to back it up. Americans' average wealth tops $301,000 per adult, enough to rank us fourth on the latest Credit Suisse Global Wealth report. But that figure doesn't tell you how the middle class American is doing. Americans' median wealth is a mere $44,900 per adult -- half have more, half have less. That's only good enough for 19th place, below Japan, Canada, Australia and much of Western Europe. "Americans tend to think of their middle class as being the richest in the world, but it turns out, in terms of wealth, they rank fairly low among major industrialized countries," said Edward Wolff, a New York University economics professor who studies net worth. Why is there such a big difference between the two measures? 9

Use the data below to calculate the mean and median of the commuting times (in minutes) of 0 randomly selected New York workers. Commute Times 0 5 1 005555 0005 3 00 4 005 5 6 005 7 8 5 Key: 4 5 represents a New York worker who reported a 45- minute travel time to work. Find the median on the stemplot: Find the mean and median using your calculator: A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. Measures of Spread: Range: the range is a single number that represents the difference between the max and the min. In everyday language we sometimes say the data values range from 5 to 35. But the correct way to say this is the Range = 30 (from 5 to 35). To calculate the quartiles, arrange the observations in increasing order and locate the median. 1. The second quartile Q is the median of the data set. It is also the 50 th percentile.. The first quartile Q 1 is the median of the observations located to the left of the median in the ordered list. It is also the 5 th percentile. 3. The third quartile Q 3 is the median of the observations located to the right of the median in the ordered list. It is also the 75 th percentile. 4. The fourth quartile Q 4 is the maximum of the data set. It is also the 100 th percentile. The interquartile range (IQR) is defined as: 10

Examples to find quartiles: 1) 1,1,3,5,9,10,11 ) 4,6,6,7,8,9 3) 3,8,9,1,14,18,0,30 In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers. The 1.5 x IQR Rule for Outliers An observation is an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Using the fence/gateway/cut-off to find outliers. Practice: Use the Commute Time data on page 10 of your notes packet to determine if there are outliers. The 5-Number Summary Minimum, Q 1, M, Q 3, Maximum Calculator: 1-Var Stats 11

Boxplot (Box & Whisker Plot) How To Make A Boxplot: A central box is drawn from the first quartile (Q ) to the third quartile (Q ). 1 3 A line in the box marks the median. Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol such as an asterisk (*). Modified Boxplot vs. Regular Boxplot Side-by-side boxplots Calculator: (always select the modified boxplot) Use trace to find values on the boxplot 1

Standard Deviation (s x ) measures the average distance of the sample observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. The average squared distance is called the variance. Formulas: standard deviation = s x = 1 n 1 ( x i x) For more on standard deviation go to page 60 of your book variance = s x ( x = 1 x) + ( x x) +... + ( x n 1 n x) 1 = n 1 ( x i x) Choosing Measures of Center and Spread Numbers can lie! (or at the very least be misleading) Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA! The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers. Use mean and standard deviation only for reasonably symmetric distributions that don t have outliers. Properties of s x : s x is standard deviation of a sample σ x is standard deviation of a population How to Organize a Statistical Problem: A Four-Step Process State: What s the question that you re trying to answer? Plan: How will you go about answering the question? What statistical techniques does this problem call for? Do: Make graphs and carry out needed calculations. Conclude: Give your conclusion in the setting of the real-world problem. 13

Homework Assignments & Section Summaries 1.0 And 1.1 Homework Exercises #1 31 odd, and # 8, 30, 3 (skip # 9) 18 questions Section 1.0 Summary A dataset contains information on individuals. For each individual, data give values for one or more variables. Variables can be categorical or quantitative. The distribution of a variable describes what values it takes and how often it takes them. Inference is the process of making a conclusion about a population based on a sample set of data. Section 1.1 Summary DISPLAY categorical data with a bar graph IDENTIFY what makes some graphs of categorical data deceptive CALCULATE and DISPLAY the marginal distribution of a categorical variable from a two-way table CALCULATE and DISPLAY the conditional distribution of a categorical variable for a particular value of the other categorical variable in a two-way table DESCRIBE the association between two categorical variables 1. Homework Exercises #37-73 odd and # 60, 70, 7, 74, (skip # 49, 51, 57, 61, 63, 67) 17 questions Section 1. Summary MAKE and INTERPRET dotplots and stemplots of quantitative data DESCRIBE the overall pattern of a distribution IDENTIFY the shape of a distribution MAKE and INTERPRET histograms of quantitative data COMPARE distributions of quantitative data 1.3 Homework Exercises #79 109 odd, and # 108, 110 (skip # 85, 101) 16 questions Section 1.3 Summary CALCULATE measures of center (mean, median). CALCULATE and INTERPRET measures of spread (range, IQR, standard deviation). CHOOSE the most appropriate measure of center and spread in a given setting. IDENTIFY outliers using the 1.5 IQR rule. MAKE and INTERPRET boxplots of quantitative data. USE appropriate graphs and numerical summaries to compare distributions of quantitative variables. 14