MATH 1040 Skittles Data Project

Similar documents
Probability and Statistics. Chapter 1

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

2.4.1 STA-O Assessment 2

Unit 7 Comparisons and Relationships

Unit 1 Exploring and Understanding Data

Statistical Methods Exam I Review

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Undertaking statistical analysis of

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

Introduction to Statistical Data Analysis I

Understandable Statistics

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Still important ideas

Still important ideas

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Making Inferences from Experiments

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf Key: 2 0 = 20 cm. 6 a Stem Leaf. c Stem Leaf

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

NORTH SOUTH UNIVERSITY TUTORIAL 1

Section I: Multiple Choice Select the best answer for each question.

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Designing Psychology Experiments: Data Analysis and Presentation

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Business Statistics Probability

Chapter 7: Descriptive Statistics

International Statistical Literacy Competition of the ISLP Training package 3

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Chapter 1: Exploring Data

Lesson 9 Presentation and Display of Quantitative Data

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Psychology Research Process

Designing Psychology Experiments: Data Analysis and Presentation

HW 1 - Bus Stat. Student:

Measuring the User Experience

Six Sigma Glossary Lean 6 Society

What Is the Fat Intake of a Typical Middle School Student?

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Math 2200 First Mid-Term Exam September 22, 2010

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Part I: Alcohol Metabolization Explore and Explain

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

Student Performance Q&A:

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

YSU Students. STATS 3743 Dr. Huang-Hwa Andy Chang Term Project 2 May 2002

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Unit 1 Outline Science Practices. Part 1 - The Scientific Method. Screencasts found at: sciencepeek.com. 1. List the steps of the scientific method.

Chapter 1. Picturing Distributions with Graphs

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

AP Stats Review for Midterm

How to interpret scientific & statistical graphs

VIEW AS Fit Page! PRESS PgDn to advance slides!

PRINCIPLES OF STATISTICS

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Observational studies; descriptive statistics

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Examining differences between two sets of scores

Section 3.2 Least-Squares Regression

Practice First Midterm Exam

LOTS of NEW stuff right away 2. The book has calculator commands 3. About 90% of technology by week 5

STP226 Brief Class Notes Instructor: Ela Jackiewicz

V. Gathering and Exploring Data

9 research designs likely for PSYC 2100

Chapter 1: Explaining Behavior

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

SCATTER PLOTS AND TREND LINES

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Introduction. Lecture 1. What is Statistics?

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Number of grams of fat x 9 calories per gram = Total number of calories from fat

Theory. = an explanation using an integrated set of principles that organizes observations and predicts behaviors or events.

Statistics Coursework Free Sample. Statistics Coursework

bivariate analysis: The statistical analysis of the relationship between two variables.

Read the next two selections. Then choose the best answer to each question. A Book for Jonah

1.4 - Linear Regression and MS Excel

Confidence Intervals and Sampling Design. Lecture Notes VI

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Choosing the correct statistical test in research

Variable Measurement, Norms & Differences

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Distributions and Samples. Clicker Question. Review

Number of grams of fat x 9 calories per gram = Total number of calories from fat

Test 1 Version A STAT 3090 Spring 2018

Transcription:

Laura Boren MATH 1040 Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized bag of skittles and count the number of each color of candy in the bag. The class data was compiled and we used it for a number of different exercises involving a different aspect of statistics. For the first part of the project, we determined the proportion of each color of candy and created a Pareto chart and a pie chart for the total number of each color of candies in the entire class. We compared the class data to our own personal data and noted any similarities or differences. For part 2 of the project we used the skittles data to create statistics summaries of the mean, standard deviation and 5-number summary. We made a frequency histogram of the total number of candies as well as a box plot. Individually, I also wrote a paragraph about the significance of different qualitative and quantitative methods of analysis. The last part of the project involved confidence intervals. We found 3 different confidence intervals for the population proportion, mean, and standard deviation and wrote an analysis about what each confidence interval meant.

Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer Math 1040 Class Proportions Color Count Proportion of Total Red 564 0.199 Orange 564 0.199 Green 566 0.199 Purple 559 0.197 Yellow 586 0.206 Total Number of in the class 2839 1.000 MATH 1040 Data

Laura Boren, Melissa Oneal, Justin Peck, Nathan Schafer Does the Class data represent a random sample? Yes, the class data does represent a random sample. Although each student was asked to buy their own bag of skittles and not every bag of skittles in the region had an equal chance of being selected, the distribution of skittles from the central plant/warehouse was most likely random. The skittles company most likely does not count colors as they load the bags and simply loads by weight, and assuming students did not make any biased decisions about which bag to grab off the shelf every bag produced had an equal chance of being shipped to any location in the country and being selected at random by a student in the class. What would the population be? In this study, the sample is the class data. Since not everyone in the class is currently living in the same state, the population would be all 2.17 ounce skittles bags in the United States. There are currently different manufacturing plants operating overseas, therefore the population can only reasonably be expanded to include the United States distribution circuit.

Laura Boren Red Yellow Orange Green Purple Total Math 1040 Data Color Class Total Proportion My Total Proportion 564 0.199 16 0.258 586 0.206 11 0.177 564 0.199 10 0.161 566 0.199 15 0.242 559 0.197 10 0.161 2839 62 My skittles bag differed quite a bit from the class data. My bag had significantly more red and green skittles than the class total, but like the class data had the fewest purple skittles. I had always assumed that red was the most common skittles color, but that may just be due to the vibrancy of the color red and it being noticed more. In my skittles bag it was the most common, but that was not supported by the class data. I was surprised to see yellow skittles being the most common in the class.

1. Using the total number of candies in each bag in our class sample, compute the following measures for the variable Total candies in each bag : (a) mean number of candies per bag The mean number of candies per bag is 59.1 candies. (b) standard deviation of the number of candies per bag The standard deviation per bag is 6.4 candies. (c) 5-number summary for the number of candies per bag The 5-number summary is 34-58-60-62-71. Report these summary statistics rounded to one decimal place, if needed.

Math 1040 Skittle Data 2015

Laura Boren Data Part 3 1. From these graphs we can conclude that the Frequency Histogram is skewed to the left, although our boxplot appeared rather symmetrical, likely due to not having smaller value increments on the number line. This distribution and skew is expected because the median number of candies per bag is 60 but the mean is only 59.1. One of the main causes of the negative skew is that several of the skittles bags only had 30-40 candies in them, which is almost half as much as the median number of skittles per bag. Those bags represent outliers, and pull the data towards the left. My data agrees with the data collected by the whole class because the highest frequency of candies per bag was between 60-65 candies per bag. My bag had 62 candies, which falls right in that class. 2. Categorical variables are also known as qualitative variables. These variables can be put into different categories, such as a model of car, color, gender, etc. Quantitative data is data that can be ordered and measured. The number of candies in a bag of skittles is quantitative, whereas the color of the candy is categorical. Graphing quantitative data is best done with histograms, stem leaf plots, dot plots, bar graphs, and box plots. All of these types of graphs can be used to measure the quantity of a certain variable. Categorical data is best graphed using a method that lets you compare the groups to one another. A bar graph can work for both quantitative and categorical data, but a pie chart doesn t make sense for quantitative data because it is comparing categories to the whole. A pie chart would effectively show the percentage of each color of skittles in a bag (categorical data), but cannot effectively be used to show the number of skittles in a bag (quantitative data). When it comes to calculations, mean and median only make sense for quantitative data. The mean is the average quantity of something in an entire sample, therefore it is a more meaningful calculation when applied to quantitative data. The median represents the middle value of the data and once again makes the most sense only when applied to quantitative data. The best central tendency to apply to categorical data is the mode. When looking at the colors of candy in a skittles bag, you may not able to find the average color or the median color, but you can establish which color occurs the most often. Likewise, when looking at the number of candies in a skittles bag, the best values for probability distributions are going to be the average and median number of skittles.

Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal 99% Confidence Interval estimate for the population proportion of yellow candies X= 586 n= 2839 Z-value for 95% CI = 2.576 p= 586/2839 = 0.206 0.206 +/- 2.576 * (0.007596) 0.206 +/- 0.01957 99% Confidence Interval Estimate: (0.186, 0.226) Confidence Intervals estimated from a population proportion are used to determine, with the specified degree of confidence, the proportion of a characteristic found within a population. In relation to the skittles, we are 99% confident that the proportion of yellow skittles in any bag of skittles falls between 0.186 and 0.226. 95% Confidence Interval estimate for the population mean number of skittles per bag n= 49 Sx = 6.38 Sample mean= 59.15 Standard error of the mean = 0.9114 To find the t-value, a t-table was consulted using a degree of freedom of 50. The t-value is 2.009. 59.15 +/ t*(0.9114) 59.15 + 1.83 = 60.98 59.15-1.83 = 57.32 95% Confidence Interval Estimate: (57.32, 60.98) Confidence Interval estimates of the population mean use sample date to extrapolate an interval with the specified degree of confidence that the mean characteristic of a population should fall within. In this case, we are 95% confident that the mean number of skittles in any bag is between 57.32 and 60.98.

Laura Boren, Nathan Schafer, Justin Peck, Melissa Oneal 98% confidence interval estimate for the population standard deviation of the number of candies per bag n=49 s=6.378 S 2 =40.679 χ 2 1-a/2 = 0.99 χ 2 a/2 = 0.01 On the Chi square distribution chart, 50 degrees of freedom was used. The value for χ 2 1-a/2 was 29.707. For χ 2 a/2 it was 76.154. [ s 2 (df)/chi value] Lower bound: 5.06 Upper bound: 8.11 Confidence Interval estimates from the population standard deviation use the sample standard deviation in order to generate an interval that the population standard deviation of the number of candies should fall within, with the specified level of confidence. In this case, we are 98% confident that the population standard deviation is within 5.06 and 8.11 candies. The problem with confidence interval estimates taken from the sample standard deviation is that the sample standard deviation may be quite different from the actual population standard deviation.

Laura Boren The purpose of taking sample data and calculating statistics from them is to apply those statistics to a larger population. Since a population is larger than a sample, how well a sample statistic can be used to estimate a population parameter is an issue. A confidence interval helps to solve that issue by allowing us to provide a range of values that the population parameter is likely to fall within. The intervals are constructed with a certain level of confidence, reflected as a percentage such as 95%, 98% or 99%. This means that if the same population were to be examined on multiple occasions and a parameter interval calculated each time, the intervals would contain the true parameter in X% of cases.

Laura Boren Skittle Project Reflection When I first started the project, I was intimidated by the process of using statistical concepts to interpret real-life data. As the project went on I became much more comfortable with concepts such as confidence intervals and creating Pareto charts and frequency histograms. In my volunteer work as a lactation educator and also as a nursing student I sometimes find myself reading and interpreting peer-reviewed clinical research. Understanding what things like confidence intervals are and what makes data significant or unusual is very helpful in interpreting such studies and thinking critically about what the data actually means. There are even some aspects of statistics that I used before taking this class. In Human Physiology we were required to calculate the mean, median, and standard deviation of lung inspiratory volume as part of our laboratory unit on the respiratory system. Taking calculus really helped me to understand real-world math applications and statistics only supported what I already knew about the practicality of math. Statistics is a very fundamental part of scientific literacy and has numerous applications in the world of business and economics. By completing the skittles project it helped me to understand how businesses and corporations might need to use statistics, particularly standard deviations, in order to produce accurate and consistent products. Statistics can also be used to calculate demand and determine shipping and distribution needs, and evaluate product quality and customer satisfaction. In our skittles project we determined the average proportion of each color of skittles candy that came in a bag as well as a confidence interval of that population proportion. This could be helpful in evaluating customer candy preferences and overall satisfaction based on flavor preference. A company might use similar statistics in real life to ensure product standardization.