The Pretest! Pretest! Pretest! Assignment (Example 2)

Similar documents
Simple Linear Regression the model, estimation and testing

Pitfalls in Linear Regression Analysis

1.4 - Linear Regression and MS Excel

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Business Statistics Probability

Choosing a Significance Test. Student Resource Sheet

AP Statistics. Semester One Review Part 1 Chapters 1-5

CHAPTER 2: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Regression Discontinuity Analysis

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

AP Stats Chap 27 Inferences for Regression

Lessons in biostatistics

Examining differences between two sets of scores

Bangor University Laboratory Exercise 1, June 2008

10. LINEAR REGRESSION AND CORRELATION

STP 231 Example FINAL

Sample Exam Paper Answer Guide

Following is a list of topics in this paper:

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Statistics 2. RCBD Review. Agriculture Innovation Program

Eating and Sleeping Habits of Different Countries

Unit 1 Exploring and Understanding Data

Statistical reports Regression, 2010

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Still important ideas

IAPT: Regression. Regression analyses

Still important ideas

EXPERIMENTAL RESEARCH DESIGNS

Exploration and Exploitation in Reinforcement Learning

An Introduction to Bayesian Statistics

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Part 1. Online Session: Math Review and Math Preparation for Course 5 minutes Introduction 45 minutes Reading and Practice Problem Assignment

Political Science 15, Winter 2014 Final Review

Section 3.2 Least-Squares Regression

F1: Introduction to Econometrics

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Chapter 3: Examining Relationships

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Estimation. Preliminary: the Normal distribution

Stat Wk 9: Hypothesis Tests and Analysis

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

1 Simple and Multiple Linear Regression Assumptions

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Understandable Statistics

Chapter 10 Quasi-Experimental and Single-Case Designs

9 research designs likely for PSYC 2100

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Calculating Regression using R. Angus Tsui, Grant Merkel, Chris Collins

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

6. Unusual and Influential Data

If you could interview anyone in the world, who. Polling. Do you think that grades in your school are inflated? would it be?

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Assessing Agreement Between Methods Of Clinical Measurement

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

One-Way Independent ANOVA

Missy Wittenzellner Big Brother Big Sister Project

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

YSU Students. STATS 3743 Dr. Huang-Hwa Andy Chang Term Project 2 May 2002

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Correlation vs. Causation - and What Are the Implications for Our Project? By Michael Reames and Gabriel Kemeny

Pooling Subjective Confidence Intervals

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

MEA DISCUSSION PAPERS

Exemplar for Internal Assessment Resource Mathematics Level 3. Resource title: Sport Science. Investigate bivariate measurement data

Study Guide for the Final Exam

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Peer review on manuscript "Multiple cues favor... female preference in..." by Peer 407

STATISTICS INFORMED DECISIONS USING DATA

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Lesson 9 Presentation and Display of Quantitative Data

STA Module 9 Confidence Intervals for One Population Mean

HW 3.2: page 193 #35-51 odd, 55, odd, 69, 71-78

STATISTICS AND RESEARCH DESIGN

MEASURES OF ASSOCIATION AND REGRESSION

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

South Australian Research and Development Institute. Positive lot sampling for E. coli O157

Non Muscle Invasive Bladder Cancer (NMIBC) Experts Discuss Treatment Options. Part II: The Future Treatment of NMIBC

Using the ISO characterized reference print conditions to test an hypothesis about common color appearance

Behavioral Data Mining. Lecture 4 Measurement

Planning Sample Size for Randomized Evaluations.

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Chapter 9: Answers. Tests of Between-Subjects Effects. Dependent Variable: Time Spent Stalking After Therapy (hours per week)

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Non-parametric methods for linkage analysis

Reflection Questions for Math 58B

Statistics and Probability

Analyzing binary outcomes, going beyond logistic regression

Transcription:

The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability to wield the statistical ideas covered in the course. However, how well a students takes a particular exam may in fact not reflect such a goal. Here I describe a trick used in our math 10 exam to help determine whether our exam really did measure a student s ability to wield the statistical ideas covered in the course. 1.1 Quantifying My Hopes In order to test whether a person posses a certain skill it often useful to develop multiple and distinct ways of a assessing that skill. If two such statistics are really measuring the same sort of thing, then they should be correlated in some way to each other. One might hope this will be a linear correlation and hence that the strength of such a linear correlation could be used to give some indication of how well these different measurements are capturing the same underlying quality. Our midterm exam is an example of such a measurement system. It was intentionally broken into two rather different parts. Part 1, which was comprised of problems very much so like the exercises from the text; and part 2, which examined the student s grasp of the material via the exploration of a pair of experiments performed in class. As the instructor, I was hoping that both these section were measuring the same sort thing, namely the a student s ability to wield the statistical ideas covered in the course. The fact that the exam was broken into two parts allows me to test whether these different measurements where measuring the same sort of thing. Hence, I will do what I can do, and test the correlation between the performances on the two exam parts. 1

To be more precise, I will compute the scatter plot, the line of best fit, strength of the correlation, and then test this correlation in order to estimate the P-value associated to this correlation under the null hypothesis that no such correlation exist. If this hypothesis test comes back as statistically significant I will accept this correlation as a real phenomena. On an interpretive level, if the strength of the correlation is above.5 and the P- value is less than 0.05, then I will choose to chose to believe that the exam, at least in part, truly reflected the students ability to wield the statistical ideas covered in the course. Making such an interpretive conclusion from such an experiment is very common, though, and I need to emphasize, this hypothesis test itself ONLY tells us only that the such scores are likely to be correlated. I would need to perform a more detailed scientific study to determine whether or not the causal conclusion concerning the student s ability has any scientific merit. 1.2 The Data Part 1 of this exam was graded by my assistant and Part 2 was graded by myself. Hence I have eliminated any desire for correlation bias. The grade are as follows. 2

Part1 Part2 54 39 43 15 44 26 34 15 48 44 55 42 44 32 48 32 47 32 55 45 46 31 43 42 35 6 53 39 38 18 35 12 33 12 33 11 35 10 33 22 50 45 38 26 43 36 53 42 45 36 55 44 44 14 50 41 55 43 36 18 I randomly permuted the above grades in order to protect my students identities. (Each of my students should find a pair of scores corresponding to you. ) In the figure, we see the scatter plot of this data. The correlation coefficient is 0.87 while the strength of this correlation is 75.6 percent. Further more n = 30, b 1 = 1.47, b 0 = 35.74 and the s b1 from section 9.3 satisfies s b1 =.157. In particular the line of best fit will be dented as ŷ = b 1 x + b 0 = 1.47x 35.74. 3

40 30 20 10 35 40 45 50 55 x Figure 1: Here we see a scatter plot of our exam data. Namely each point corresponds to a exam. The score associated to part 1 of the exam is the x coordinate of the point, and the score associated to part 2 of the exam is the y-coordinate of the point. 4

15 10 5 0 5 35 40 45 50 55 10 15 Figure 2: Here is our residual plot. Namely I have graphed the x-coordinate value of our points from figure 1, against the difference between the actual y-coordinate values from figure 1 and the corresponding ŷ value. Real ŷ is the second part s score as estimated from the line of best fit. 1.3 The Hypothesis Test The data appears nicely linearly correlated. If I want to test this correlation I can test the positivity of slope β 1 by using a t-test with n 2 = 28 degrees of freedom as described in section 9.3. To be more precise, I test the alternative hypothesis that β 1 > 0 against the null hypothesis that β 1 = 0 wit ha levl of significance of α = 0.01. In order to honestly perform such a test, I need to verify that my data satisfies the properties of the linear regression model. For a small data set one nice way to asses this is to look at the plot of the residuals, see figure 2. Here I will enumerate the the properties of the linear regression model as was done in section 9.2. 3. Homoscedasticity. Looking at the residuals in figure 2 we see that this assumption appears reasonable. Namely, the variance is not constant but there is not a very dramatic change. If anything, it appears that the variance for higher x-coordinate scores shrinks a little. This is almost certainly do to the fact the exam failed to separated scores at the high end. We can see this issue in the histogram of the scores in figure 3. 2. Linearity. The data looks like the relationship is linear. Looking at the 5

0.05 0.04 0.03 0.02 0.01 0 10 20 30 40 Figure 3: Here we see a histogram of part 2 of the exam. The general shape of this picture is quite typical for a poorly written exam, namely it is smashed on the left. Typically this means that the exam was not hard enough to separate out the students at the high end. Unlike most such curves, this one also has a certain bimodality to it. Upon consulting with my Math 10 class we determined that this bimodality was likely to due to the fact that the two question in part 2 of the exam were of very different difficulty levels. Most people had a good idea how to handle the first of the two questions, while some positive fraction of the class were unable to even get started on the second question. Hence the bimodality. 6

0.08 0.06 0.04 0.02 15 10 5 5 10 15 Figure 4: Here is histogram of the residuals. Looking at our residual data we anticipate the decrease in the variance among higher scores explains why this histogram is a bit tight around zero. Certainly it is reasonably bell shaped. residuals linearity looks very reasonable indeed, namely the graph does appear to hover around the line x = 0 as needed. 4. Independence should be fine here, since the honor principle assure me that who a student s neighbor was when they took the exam will not effect the exams score. 1. Normality. Experience tells me to expect Bell shaped curves in general when one fixes the score on one exam and compares another exam score. Since the data nearly satisfies part 3 above, we can test this by simply seeing if the histogram of the residuals has a bell type shape. Looking at this graph in figure 4 we see this assumption looks pretty good. Once again the distribution is perhaps a bit tight around 0, due to the fact that the variance decreased among the high score, see figure 3. Since we have basically satisfied 1-4 of the regression model we may utilize the hypothesis test on b 1, described above In other words under the null hypothesis, b 1 s b1 = 9.311 is basically a t distribution as 30 2 = 28 degrees of freedom. In particular, the P value is nearly zero and such correlation is virtually certain to exist and we easily 7

pass our hypothesis test. In fact, the 99 percent confidence interval confidence interval for the β 1 I based on this data is [b 1 t α/2 s b1, b 1 +t α/2 s b1 ] = [1.47 (2.76)(.157), 1.47+(2.76)(.157)] = [1.04, 1.90]. 2 Conclusions Looking at my data it looks like I failed to make a really good exam with respect to the linear correlation model s needs. The bulk of the problem is the phenomena explored in figure 3, namely, that the exam failed to separate at the high end of the score spectrum. In the process, the variance of my residuals at the high end was much smaller. This is a violation of homoscedaticity. We saw that this also affected our exploration of the Normality assumption. It seems that these problems could be avoided by attempting to make a more challenging exam, though I d need to be careful not to make it to hard (in order to avoid a clumping of the values at the low end). Clearly such an attempt can be made in preparing the final. More importantly, my correlation measurements did not necessarily quantify whether the exam really measured a student s ability to wield the statistical ideas covered in the course. In particular, since both parts of the exam were taken at the same time it does not take into account the possibility that a given student is just having a bad day. To arrive at a more meaningful sense of whether this goal has been accomplished, I will need to compare performances between more distinct measurement of a student s ability. It would be nice to compare measurement made a different times and in more transparently distinct ways. For example, the correlation between mini-project grades and the final exam scores would be much more revealing, since the mini-projects and final exams are both attempting to measure a student s ability to wield the statistical ideas covered in the course but in VERY different ways. Hence, a more suitable second pretest will be to look at the correlation between the Final and the Mini-project, rather than the correlation between two parts of the final. On an interpretive level, if the strength of the such a correlation is above.5 and the P-value is less than 0.05, then I will happily chose to believe that the assessment strategy used to determine my students grades was, at least in part, truly reflecting my students ability to wield the statistical ideas covered in the course. 8