STAT 608 Guided Exercise 1

Similar documents
STAT 408/608 Guided Exercise 1

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Undertaking statistical analysis of

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

I will investigate the difference between male athlete and female athlete BMI, for athletes who belong to the Australian Institute of Sport.

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Math Workshop On-Line Tutorial Judi Manola Paul Catalano

Lecture 13. Outliers

HS Exam 1 -- March 9, 2006

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Math Workshop On-Line Tutorial Judi Manola Paul Catalano. Slide 1. Slide 3

Chapter 20: Test Administration and Interpretation

Variability. After reading this chapter, you should be able to do the following:

Understandable Statistics

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Introduction to Statistical Data Analysis I

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Welcome to OSA Training Statistics Part II

about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat.

Chapter 1: Exploring Data

LAB 2: DATA ANALYSIS: STATISTICS, and GRAPHING

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Probability and Statistics. Chapter 1

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Psychologist use statistics for 2 things

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CCM6+7+ Unit 12 Data Collection and Analysis

Chapter 1. Picturing Distributions with Graphs

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Statistics is a broad mathematical discipline dealing with

Still important ideas

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Business Statistics Probability

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Math 2200 First Mid-Term Exam September 22, 2010

International Statistical Literacy Competition of the ISLP Training package 3

HANDLING EXCEPTIONS: PROGRAMMING EXERCISES

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Observational studies; descriptive statistics

Interpreting the Item Analysis Score Report Statistical Information

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

AP Stats Review for Midterm

NORTH SOUTH UNIVERSITY TUTORIAL 1

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

Descriptive Statistics Lecture

4.3 Measures of Variation

Comparison of Estimates From An Address-Based Mail Survey And A RDD Telephone Survey

Psychology Research Process

Averages and Variation

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Quantitative Data and Measurement. POLI 205 Doing Research in Politics. Fall 2015

Statistical Methods Exam I Review

MATH 1040 Skittles Data Project

Section 3.2 Least-Squares Regression

Elementary Statistics:

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Instructions and Checklist

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

SAMPLE ASSESSMENT TASKS MATHEMATICS ESSENTIAL GENERAL YEAR 11

Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3 - Numerical Descriptive Measures Solutions

bivariate analysis: The statistical analysis of the relationship between two variables.

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Political Science 15, Winter 2014 Final Review

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Free biggest loser weight loss calculator

Regression Including the Interaction Between Quantitative Variables

Basic Statistics for Comparing the Centers of Continuous Data From Two Groups

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Lecture 7 Body Composition Lecture 7 1. What is Body Composition? 2. Healthy Body Weight 3. Body Fat Distribution 4. What Affects Weight Gain?

Chapter 3: Examining Relationships

Adult overweight and obesity

Chapter 5 Analyzing Quantitative Research Literature

Lecture 7 Body Composition Lecture 7 1. What is Body Composition? 2. Healthy Body Weight 3. Body Fat Distribution 4. What Affects Weight Gain?

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Missy Wittenzellner Big Brother Big Sister Project

Chapter 2--Norms and Basic Statistics for Testing

CANCER FACTS & FIGURES For African Americans

Procedures for taking physical measurements

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

2.4.1 STA-O Assessment 2

Transcription:

STAT 608 Guided Exercise 1 Be sure to: Please submit your answers in a Word file to Sakai at the same place you downloaded the file Remember you can paste any Excel or JMP output into a Word File (use Paste Special for best results). Put your name and the Assignment # on the file name: e.g. Ilvento Guided1.doc Answer as completely as you can and show your work. 1. I love data controversies! Read these articles on Body Mass Index (BMI) (the letter is a follow-up to the original story in the Wilmington News Journal. So what is the BMI? Here is the information from the web site of the National Institutes of Health. BMI is a reliable indicator of total body fat, which is related to the risk of disease and death. The score is valid for both men and women but it does have some limits. The limits are: Page 1 of 6

It may overestimate body fat in athletes and others who have a muscular build. It may underestimate body fat in older persons and others who have lost muscle mass. The formula for BMI is: Metric Formula: weight (kg)/[height (m)] 2 Example: Weight = 68 kg, Height = 165 cm (1.65 m) Calculation: 68 (1.65) 2 = 24.98 Pounds/inches Formula: weight (lbs)/[height (in.)] 2 * 703 Example: Weight = 150 lbs, Height = 5 5 (65") Calculation: [150 (65) 2 ] x 703 = 24.96 The National Institutes of Health uses the following BMI Categories: Underweight = <18.5 Normal weight = 18.5-24.9 Overweight = 25-29.9 Obesity = BMI of 30 or greater BMI is an indicator variable. The meaning of an indicator variable is that it seeks to easily measure something that is complex is an easier, cheaper, and still meaningful way. There are other ways to measure body fat, but they are most costly and more invasive (e.g., you have to get into a body of water). With the BMI, you only need a persons height and weight. An indicator variable should be highly correlated (for now, think of correlated as related ) with a more accurate measure to be considered valid. Thus measures of total body fat and BMI should agree across a wide sample of subjects. A few additional things to note: It is true that the definition of being overweight changed in 1998. It is also true that the consideration of RISK from being overweight or obese involves other things, such as risks from such things as high blood pressure, cholesterol levels, and smoking. The writer of the first article does represent industries with an interest in selling food products. So, what do you think? Is the BMI a useful indicator of how overweight people are? Is the notion of being overweight too highly politicized? Should we be about labeling who is or is not overweight or obese? There are no right or wrong answers here, just your opinions! There are no right or wrong answers here, just your opinions! I just want you to realize that measurement is an important part of many data analyses and that some measures are not as simple as we might think. Page 2 of 6

2. Academy Award winners for best actor (and actress) since 1996. Each year the Academy of Motion Picture Arts and Sciences picks a best actor and best actress in a film. Below is the data for males and females since 1996, along with their age. We are going to plot and calculate sample statistics for both men and women to make a comparison. YEAR ACTOR MALE AGE ACTRESS FEMALE AGE 1996 Geoffrey Rush 45 Frances McDormand 39 1997 Jack Nicholson 60 Helen Hunt 34 1998 Roberto Benigni 46 Gwyneth Paltrow 26 1999 Kevin Spacey 40 Hilary Swank 25 2000 Russell Crowe 36 Julia Roberts 33 2001 Denzel Washington 47 Halle Berry 35 2002 Adrien Brody 29 Nicole Kidman 35 2003 Sean Penn 43 Charlize Theron 28 2004 Jamie Foxx 37 Hilary Swank 30 2005 Philip Seymour Hoffman 38 Reese Witherspoon 29 2006 Forest Whitiker 45 Helen Mirren 61 2007 Daniel Day-Lewis 50 Marion Cotillard 32 2008 Sean Penn 48 Kate Winslet 33 2009 Jeff Bridges 60 Sandra Bullock 45 2010 Colin Firth 50 Natalie Portman 29 2011 Jean Dujardin 39 Meryl Streep 62 a. Construct a Stem and Leaf plot for each group to compare the distributions. Stem and Leaf Plot of Actor s Age Males Females Stem Leaf Stem Leaf 6 0 0 6 1 2 5 0 0 5 4 0 3 5 5 6 7 8 4 5 3 6 7 8 9 3 0 2 3 3 4 5 5 9 2 9 2 5 6 8 9 9 6 0 represents 60 6 1 represents 61 The distribution for males is more symmetrical and centered in the 40 s. The distribution for females is centered in the 30 s and has two outliers at 61 and 62. Page 3 of 6

b. Calculate the measures of central tendency and variability for each group. The sum of X and the sum of X-squared for each group are. Male Female Sum X 713 576 Sum X-squared 32799 22586 Males Females Mean 713/16 = 44.6 576/16 = 36.0 Median 45.0 33.0 Mode undefined undefined Range 60 29 = 31 62 25 = 37 Variance 68.4 123.3 Standard Deviation 8.3 11.1 Coefficient of Variation 18.6% 30.8% c. Briefly compare the two distributions with an emphasis on the measures of Central Tendency and Variability. The mean for males is higher than that of females, 44.6 versus 36. However, the mean and the median for males are very close while the mean for females is pulled upward by the outliers for females. There is more variability in the distribution for females with a higher variance, standard deviation, and Coefficient of Variation. All are being pulled by the outliers. d. For both men and women there are a few outliers. For men there are two individuals with a value of 60. For women there is one winner aged 61 and another aged 62. Calculate z-scores for these values and interpret their meaning. Males z = (60 44.6)/8.3 = 1.87 This observation is 1.87 standard deviations above the mean Female z1 = (61 36.0)/11.1 = 2.25 This observation is 2.25 standard deviations above the mean Female z2 = (62 36)/11.1 = 2.34 This observation is 2.34 standard deviations above the mean Suppose we wanted to remove the two female outliers from the data. Calculate the new mean for women winners for the remaining 14 winners. Hint: subtract the values from the old sum and divide by 14. Did the outliers influence the mean age much? New Sum = (576 62 61) = 453 New Mean = 453/14 = 32.36 The mean dropped by 3.64 years, or a 10% decline. Page 4 of 6

3. The following is some data from The Daily Beast on the 50 Most Stressful Universities in 2010. We are looking at the Acceptance rate for these 50 universities. The Acceptance rate is based on the percentage of applicants who were admitted. The Histogram and the Stem and Leaf Plot for this data is given below (note the Stem and Leaf Plot rounds the numbers to a whole number). Use the stem and leaf values for some calculations, such as the min and max. For other calculations, the Sum of (x) is 1574.70 and the Sum of (x 2 ) is 62204.53. The Median for this data is 26.85. a. Calculate the: Mean = 31.49 Median = 26.85 Mode = 22 Maximum = 73 Minimum = 8 Range = 65 Variance = 257.37 Standard Deviation = 16.04 Coefficient of Variation = 50.94 b. What is the position of the median value for this data? Since n=50, the position is between the 25 th and 26 th positions. We would take the average of these two values. c. Does the mode make sense as a measure of Central Tendency for this data? Based on the Stem and Leaf Plot, the mode is 22%. This is a measure of center for one bunching of the data, but there is much more spread and a other groupings of the data. d. Calculate a z-score for an acceptance rate of 61% z = (61-31.49)/16.04 = 1.84. This value is 1.84 standard deviations above the mean e. Based on what you know about the different criteria used by different universities to judge students for admittance, why do you think this distribution looks the way it does? Think about the spread of the data and the measures of spread for the data, such as the range and standard deviation. Does the spread seem large? Hint: Harvard has the lowest acceptance rate at 7.9%. The Pennsylvania State University has an acceptance rate of 51.2%. The spread is very large. The CV is 50.94%. It might reflect differences between public and private institutions. Private institutions generally have lower acceptance rates. Public schools may have as part of their mission to have higher rates of acceptance to provide educational opportunities to citizens in the state. Even for the most stress universities, generally thought to be the most rigorous, the acceptance rate for public institutions should be higher. We could think of this data as being two populations. Page 5 of 6

The Box Plots show a difference between Public and private Universities. There still is a lot of spread for each type of university - some private universities have high acceptance rates and some public universities have low acceptance rates. But we can see two distinct groups. 3. Answer the following questions about variability of data sets: a. How would you describe the variance and standard deviation in words, rather than a formula? Think of what you are calculating and how it might be useful in describing a variable. The Variance is the average Squared deviation around the center (in this case the center is the mean). The standard deviation is the average deviation around the center (in this case the center is the mean). b. What is the primary advantage of using the inter-quartile range compared with the range when describing the variability of a variable? The range only uses two values - the maximum and the minimum - to calculate the range. It can be very sensitive to outliers. The inter-quartile range shows the range of the middle 50% of the values. c. Can the standard deviation ever be larger than the variance? Explain. In most cases the standard deviation is less than the variance since it is a square root of the variance. However, in the special case where the variance is between 0 and 1, the standard deviation will be more than the variance. For example, if S 2 =.5, then s =.71 d. Can the variance ever be negative? Why or why not? Since the variance is based on a squared measure, no, it cannot be negative. e. Show the formula for the Coefficient of Variation and explain what it is and how it can be useful in comparing the variability of different variables. The ratio of the standard deviation to the absolute value of the mean, usually multiplied by 100. It expresses the standard deviation in relation to the mean. It makes it easier to compare the spread of different variables, even if they are measured on different metrics Page 6 of 6