M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Similar documents
M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

AP Statistics. Semester One Review Part 1 Chapters 1-5

Unit 1 Exploring and Understanding Data

Chapter 1: Exploring Data

(a) 50% of the shows have a rating greater than: impossible to tell

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Understandable Statistics

(a) 50% of the shows have a rating greater than: impossible to tell

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Section I: Multiple Choice Select the best answer for each question.

STOR 155 Section 2 Midterm Exam 1 (9/29/09)

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Section 3.2 Least-Squares Regression

Chapter 3: Examining Relationships

A response variable is a variable that. An explanatory variable is a variable that.

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

AP STATISTICS 2010 SCORING GUIDELINES (Form B)

AP Stats Review for Midterm

Test 1C AP Statistics Name:

CHAPTER 3 Describing Relationships

3.2A Least-Squares Regression

STATISTICS & PROBABILITY

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

V. Gathering and Exploring Data

Business Statistics Probability

12.1 Inference for Linear Regression. Introduction

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Observational studies; descriptive statistics

STATISTICS INFORMED DECISIONS USING DATA

Chapter 4: More about Relationships between Two-Variables Review Sheet

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

UF#Stats#Club#STA#2023#Exam#1#Review#Packet# #Fall#2013#

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

3.2 Least- Squares Regression

CP Statistics Sem 1 Final Exam Review

Key: 18 5 = 1.85 cm. 5 a Stem Leaf. Key: 2 0 = 20 points. b Stem Leaf Key: 2 0 = 20 cm. 6 a Stem Leaf. c Stem Leaf

Introduction to Statistical Data Analysis I

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Undertaking statistical analysis of

AP Statistics Practice Test Ch. 3 and Previous

Chapter 4: More about Relationships between Two-Variables

Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007

3.4 What are some cautions in analyzing association?

Test 1 Version A STAT 3090 Spring 2018

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Homework Linear Regression Problems should be worked out in your notebook

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Instructions and Checklist

STAT 201 Chapter 3. Association and Regression

Lesson 1: Distributions and Their Shapes

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Still important ideas

REVIEW PROBLEMS FOR FIRST EXAM

MATH 1040 Skittles Data Project

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Math 124: Module 2, Part II

Chapter 3 CORRELATION AND REGRESSION

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Lesson 1: Distributions and Their Shapes

Review Questions Part 2 (MP 4 and 5) College Statistics. 1. Identify each of the following variables as qualitative or quantitative:

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Chapter 1 Where Do Data Come From?

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

How to interpret scientific & statistical graphs

Math 2200 First Mid-Term Exam September 22, 2010

International Statistical Literacy Competition of the ISLP Training package 3

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Practice First Midterm Exam

Chapter 3: Describing Relationships

14.1: Inference about the Model

Chapter 7: Descriptive Statistics

Choosing a Significance Test. Student Resource Sheet

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Chapter 1. Picturing Distributions with Graphs

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Chapter 1: Explaining Behavior

Math 075 Activities and Worksheets Book 2:

Lesson 9 Presentation and Display of Quantitative Data

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

Data Science and Statistics in Research: unlocking the power of your data

9 research designs likely for PSYC 2100

An Introduction to Statistical Thinking Dan Schafer Table of Contents

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

BIVARIATE DATA ANALYSIS

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Transcription:

M 140 est 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75

Multiple choice questions (1 point each) For questions 1-3 consider the following: he distribution of scores on a certain statistics test is strongly skewed to the left. 1. How would the mean and the median compare for this distribution? a) mean < median b) mean = median c) mean > median d) Can t tell without more information When a distribution skewed to the left, the tail is on the left. Since the tail pulls the mean, the mean is less than the median. 2. Which set of measures of center and spread are more appropriate for the distribution of scores? a) Mean and standard deviation b) Median and interquartile range c) Mean and interquartile range d) Median and standard deviation For skewed distributions the median and the interquartile range are the appropriate measures of center and spread because they are resistant to outliers. 3. What does this suggest about the difficulty of the test? a) It was an easy test b) It was a hard test c) It wasn t too hard or too easy d) It is impossible to tell If the distribution is skewed to the left, then most of the values are to the right, toward the higher values. hus, more students got higher scores, which suggests that the test was easy. 4. Which one of the following variables is NO categorical? a) whether or not an individual has pierced ears b) religion c) length of a rattlesnake d) zipcode All the other variables are categorical the average doesn t make sense for those. 5. What percent of the observations in a distribution lie between the first quartile Q 1 and the third quartile Q 3? a) Approximately 25% b) Approximately 50% c) Approximately 75% d) Approximately 100% 2

Since Q1 is the first quartile (25%), and Q3 is the third quartile (75%), between them we have approximately 50% of the data. 6. Which of these statements are FALSE? a) here is a very weak linear relationship between gender and height because we found a correlation of 0.89. b) Plant height and leaf height are strongly correlated because the correlation coefficient is -1.41. c) If we change the units of measurements, the correlation coefficient will change. d) All of the above. e) None of the above. All of the statements are false. a) is false because a correlation of 0.89 means a strong, not a very weak relationship. b) is false because the correlation coefficient cannot be less than -1. c) is false because the correlation coefficient wouldn t change if we change the units of the measurement. 7. Consider two data sets A and B with more than 3 values. he sets are identical except that the maximum value of data set B is three times greater than the maximum value of set A. Which of the following statements is NO true? a) he median of set A is the same as the median of set B. b) he mean of set A is smaller than the mean of set B. c) he standard deviation of set A is the same as the standard deviation of set B. d) he range of set A is smaller than the range of set B. Standard deviation takes into consideration all the values. hus, the data set B has higher standard deviation. 8. A vending machine automatically pours coffee into cups. he amount of coffee dispensed into a cup is roughly bell shaped with a mean of 7.6 ounces. From the Standard Deviation Rule we know that 95% of the cups contain an amount of coffee between 6.8 and 8.4 ounces. hus, the standard deviation of the distribution is approximately equal to: a) 0.8 ounces b) 0.4 ounces c) 1.2 ounces d) 0.2 ounces From the Standard Deviation Rule we know that 95% of the data is between two standard deviations below and above the mean. hus, the standard deviation must be (8.4 7.6)/2 = 0.4 9. A study of the salaries of professors at Smart University shows that the median salary for female professors is considerably less than the median male salary. Further investigation shows that the median salaries for male and female professors are about the same in every department (English, Physics, etc.) of the university. his apparent contradiction is an example of a) extrapolation b) Simpson s paradox c) causation d) correlation 3

Simpson s paradox demonstrates that a great deal of care has to be taken when combining small data sets into a large one. Sometimes conclusions from the large data set are exactly the opposite of conclusion from the smaller sets. And this is the case here. he conclusion was reversed as we included the lurking variable, department. 10. Which of the following statements is FALSE? a) he only way the standard deviation can be 0 is when all the observations have the same value. b) If you interchange the explanatory variable and the response variable, the correlation coefficient remains the same. c) If the correlation coefficient between two variables is 0, that means that there is no relationship between the two variables. d) he correlation coefficient has no units. e) he standard deviation has the same units as the data. c) is false because if r = 0, that means that there is no linear relationship between the two variables. It could be a very strong non-linear relationship. 11. RUE or FALSE? Circle either or F for each statement below (1 point each) F Consider the following data set: 2 3 5 7 8 10 12 15 20 31. According to the 1.5(IQR) rule, 31 is not an outlier. Q1 = 5, Q3 = 15, so IQR = 10. According to the 1.5(IQR) rule, Q3 + 1.5(IQR) = 15 + 1.5(10) = 30. Since 31 is higher than 30, it is considered as an outlier according to this rule. F wo variables have a negative linear correlation. hat means that the response variable increases as the explanatory variable increases. No, negative linear correlation means that as the explanatory variables increases, the response variable decreases. F If in each of the three schools in a certain school district, a higher percentage of girls than boys gave out Valentines, then it must be true for the district as a whole that a higher percentage of girls than boys gave out Valentines. It could be, but it is not necessarily true. his goes back to Simpson s paradox. F We can display one categorical variable graphically with a histogram, a stemplot, or a boxplot. No, these are used to display a quantitative variable. F A correlation of 0.8 is indicates a stronger relationship than the correlation of 0.8. No, assuming linear relationship, 0.8 is as strong as 0.8. he sign only shows the direction. 4

For the following five statements consider this: Infant mortality rates per 1000 live births for the following regions are summarized with side-byside boxplots: (www.worldbank.org) Region 1: Asia (South and East) and the Pacific Region 2: Europe and Central Asia Region 3: Middle East and North Africa Region 4: North and Central America and the Caribbean Region 5: South America Region 6: Sub-Saharan Africa F About 75% of the countries in Region 1 have an infant mortality rate less than about 75, while about 75% of the countries in Region 6 have an infant mortality rate more than about 75. F he distribution of infant mortality rates in Region 4 is left skewed. No, it s right skewed. You can see the longer tail in the higher values. F he median infant mortality rate in Region 4 is about the same as the first quartile of Region 3. F he variability in the infant mortality rate is the smallest in Regions 2 and 5. F he infant mortality rate in all countries except for one in Region 2 is lower than the mortality rate of 50% of the countries in Region 1. 5

12. (3 points) Match each of the five scatterplots with its correlation. 0.7-0.75-0.95 0 0.95 b. a. e. d. c. 13. (4 points) Answer the following questions: a) Which measure of spread indicates variation about the mean? he standard deviation 6 b) Which graphical display shows the median and data spread about the median? boxplot 14. (18 points) For each of the situations described below, identify the explanatory variable and the response variable, and indicate if they are quantitative or categorical. Also, write the appropriate graphical display for each situation. a) You want to explore the relationship between nationality and IQ scores. he explanatory variable is: nationality. he response variable is: IQ scores. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case I. An appropriate graphical display would be: side-by-side boxplots.

b) You want to explore the relationship between gender and party affiliation. he explanatory variable is: gender. he response variable is: party affiliation. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case II. An appropriate graphical display would be: double bar graph. c) You want to explore the relationship between the weight of the car and its mpg (miles per gallon). he explanatory variable is: weight of the car. he response variable is: mpg. he explanatory variable is Categorical Quantitative he response variable is Categorical Quantitative herefore, this is an example of Case III. An appropriate graphical display would be: scatterplot. 15. (8 points) hirty students in an experimental psychology class used various techniques to train rats to move through a maze. he running times of the rats (in minutes) are recorded at the end of the experiment. he following stemplot shows the results, where 3 1 represents 3.1 minutes. he summary statistics are also provided. 0 6 1 0167799 2 0047 3 123678 4 02455 5 236 6 0 7 6 8 0 9 1 a) Briefly describe the shape of the distribution from the stemplot. Skewed to the right N = 30 Mean = 3.698 Median = 3.45 StDev = 2.118 Minimum = 0.6 Maximum = 9.1 Q1 = 1.9 Q3 = 4.5 7

b) Check the mean and the median. What do you think, did I switch them to trick you, or do they seem to be correct? Explain. hey are correct. When the distribution is skewed to the right, the mean is usually higher than the median. So I didn t switch them. c) Find the value of the interquartile range, and interpret it clearly in the context of the experiment. IQR = Q3 Q1 = 4.5 1.9 = 2.6 Interquartile range is the range of the middle50% of the data. In this example, it means that 16 of the rats completed the maze between 1.9 and 4.5 minutes. Seven rats completed the maze below 1.9 minutes, and seven completed it in more than 4.5 minutes. d) Which one of the following boxplots represents the above data? Circle your answer. 9.1 is a suspected outlier. According to the 1.5(IQR) rule it is an outlier. Q3 + 1.5(IQR) = 4.5 + 1.5(2.6) = 8.4. Since 9.1 is above 8.4, it is an outlier. hat leaves C2 or C3 to be the correct answers. But since the next highest number in the list is 8.0 minutes, the correct boxplot must be C2. 16. (7 points) Consider the following two distributions. he first one (A) shows the distribution of the number of houseplants owned by a sample of 30 households in Los Angeles. he second one (B) shows for a sample of 30 freshmen the distribution of the number of girlfriends/boyfriends they have ever had. 8

a) What percent of households have one houseplant or none? One household has no plants, and two households have 1 plant (see yellow highlights on the graph). hat is 3 out of 30 households, 3/30 = 10% b) Which distribution has the lower standard deviation and why? A has the lower standard deviation because the values are closer to the mean on average. Most of the values are around the center, less and less are farther from the center. c. Select the statement below that gives the most complete and correct statistical description of the graph A. A. he bars go from 0 to 10, increasing in height to 4, then decreasing to 10. he tallest bar is at 4. here is a gap between 8 and 10. B. he distribution is approximately normal, with a mean of about 4 and a standard deviation of about 1. C. Most households seem to have about 4 houseplants, but some have more or less. One household has 10 plants. D. he distribution of the number of houseplants is somewhat symmetric and bell-shaped, with one possible outlier at 10. he typical number of houseplants owned is about 4, and the overall range is 10 plants. A description of a distribution should always be in context. So answers A and B are out. Between C and D, D is the most complete and correct description. 17. he number of hours 21 students spent studying (in hours) for a test and their scores on that test recorded. Hours spent studying 0.5 1.0 2.5 4.0 4.5 5.0 5.5 5.0 6.0 6.5 7.0 7.0 8.0 3.5 3.0 1.0 2.0 5.5 6.0 3.0 7.5 est scores 40 41 51 50 64 69 73 75 68 93 84 90 95 50 62 51 62 69 73 58 52 he mean hours spent studying is: 4.48 hours he standard deviation of the hours spent studying is: 2.25 hours he mean test scores is: 65.24 he standard deviation of the test scores is: 16.15 he correlation coefficient is: 0.79 9

a) Describe the scatterplot. Make sure you mention all four features. Form: linear Direction: positive Strength: moderately strong Outliers: maybe one at about (7.5, 50) b) Find the equation of the least squares line, and sketch the line on the plot. Use three decimal digits in your answers. he equation of the LSRL is Y = a + bx, where b = r s, and a = Y bx. s In this case, b r s y = = 0 79 1615.. = 5670. and a = Y bx = 6524. 5670. ( 4. 48) = 39. 838 sx 2. 25 hus, the equation of the LSRL is Y = 39.838 + 5.670X c) Provide an interpretation of the slope in this context. he slope is 5.670 scores/hour of study. hat means that students who studied an extra hour for the test got 5.670 more points on average on their tests. y x d) Predict the test scores for a student who studied 1.5 hours for the test. Y = 39.838 + 5.670X = 39.838 + 5.670(1.5) = 48.343 Using the LSRL, we can predict that a student who studied 1.5 hours for the test got about 48 points on the test. e) Would it be OK to use the regression line to predict the test score for a student who studied 10 hours for the test? Explain. No. hat would be extrapolation. f) Does the data support the notion that test scores are associated with the number of hours spent studying for the test? Explain briefly, giving all evidence that supports your contention. Yes. It seems to be a positive association between the number of hours spent studying for the test and the test scores. he scatterplot shows a positive relationship, and the correlation coefficient is high. In fact, r 2 = 0.62, meaning that about 62% of the variation in the test scores can be explained by the number of hours spent studying, while 38% of the variation is attributable to factors other than number of hours studied. g) Can you name one lurking variable that might affect the test scores? Maybe students interest in the material. Number of hours slept. est anxiety. Gender. IQ. Students previous knowledge of the material, etc. 10