The wicked learning environment of regression toward the mean

The wicked learning environment of regression toward the mean Working paper December 2016 Robin M. Hogarth 1 & Emre Soyer 2 1 Department of Economics and Business, Universitat Pompeu Fabra, Barcelona 2 Faculty of Business, Ozyegin University, Istanbul robin.hogarth@upf.edu emre.soyer@ozyegin.edu.tr 1

Abstract The environment in which people experience regression toward the mean inhibits accurate learning and valid intuitions. In predictive tasks, regression effects are only salient in rare cases where cues take extreme values. People often experience regression away from the mean. Furthermore, errors from predictions that ignore regression effects correlate highly with those of optimal predictions. In diagnostic tasks, people fail to recognize regression effects because they are motivated to seek causal explanations. Causes are attributed to easily identifiable factors that make good stories. A simple heuristic can overcome these inferential difficulties. In predictive tasks, a 50/50 rule that gives equal weight to the cue and the mean of the target variable approximates optimal performance. In diagnostic tasks, the same rule can be used to generate non-causal counterfactuals to challenge possible causal candidates. Keywords: regression toward the mean; wicked learning environment; experience; judgment; intuition; prediction; diagnosis. 2

It is well accepted that people s intuitions about regression toward the mean are deficient and that this can lead to inferential errors (Kahneman, 2011; Tversky & Kahneman, 1974). Evaluations can suffer in a wide range of decision situations where outcomes involve varying degrees of noise (Bland & Altman, 1994; Gilovich, 2008). In healthcare, placebo effects are confused with regression effects, leading to overestimations of drug effectiveness (Morton & Torgerson, 2003; McDonald, Mazzuca, & McCabe, 1983). In management, performance assessments underestimate their role, warping employee evaluations (Denrell, Fang, & Liu, 2014). In sports, misperceptions lead to superstitions. For instance, Wikipedia features a growing list of athletes who saw their careers decline because they were featured on the cover of a magazine, a phenomenon known as the Sports Illustrated cover jinx. It is significant to recall, however, that the concept of regression toward the mean was not formally recognized until the late 19 th century and even then it took many years to define (Stigler, 1997). Perhaps, it should not be surprising that today people s intuitions are still incomplete. In this paper, we argue that the environment in which people experience regression effects is often wicked in that feedback does not reinforce appropriate intuitions (Hogarth, 2001; Hogarth, Lejarraga, & Soyer, 2015). Accordingly, we propose a simple judgmental strategy that can alleviate the implications of these difficulties. Forward vs. backward inference Regression effects are a consequence of imperfect correlation between two variables (Fielder & Unkelbach, 2014). However, it is important to note that they can be experienced in two temporal directions. Consider two variables, a cue X and a target Y, and assume that X occurs before Y. In forward inference, tasks are predictive in nature. That is, the typical question is Given the cue X, what is the value of the target Y? For example, Joe got 90 (out of 100) on the mid-term, what will he score on the final (out of 100)? Failure to account for regression toward the mean occurs when insufficient weight is given to the mean of the target Y in making a prediction based on the cue X. In backward inference, tasks are diagnostic in nature. People know Y, observe X and then seek to explain Y. That is, the question is What explains Y? For example, Joe got 75 (out of 100) on the final after scoring 90 (out of 100) on the mid-term. Why? Here failure to account for regression toward the mean occurs when an inappropriate causal explanation is offered instead of an interpretation based on chance (i.e., regression effects). Forward and backward inference tasks present the decision maker with quite different inferential challenges. 3

Predictive tasks We first consider predictive tasks and ask when regression toward the mean is likely to be observed. Continuing with variables X and Y, assume that you have observed a value of X = x c. What is the probability of observing regression toward the mean? That is, what is the probability of observing a value of Y equal to or smaller than Y = y c when x c and y c are both at the same percentiles of their respective distributions? For concreteness, let X and Y represent scores that a person achieves at two separate sittings of an IQ test ~ N(100,15). Further, imagine that the score was 135 at the first sitting (X). What are the chances that the second test (Y) will exhibit regression toward the mean, that is, the score will be less than 135? Table 1 reports the probability of observing regression toward the mean for different levels of IQ scores and correlations between the test sittings. The rows denote percentiles (and their corresponding IQ scores) and the columns levels of correlation between X and Y. At the 0.99 percentile (IQ = 135), the probability of observing regression toward the mean is almost certain (0.98), but only if the correlation between X and Y is small (0.1). This drops to 0.70 when the correlation between X and Y is 0.9. Percentile IQ ρ X,Y 0.1 0.3 0.5 0.7 0.9 0.99 135 0.98 0.96 0.91 0.84 0.70 0.95 125 0.93 0.89 0.83 0.76 0.65 0.85 116 0.83 0.78 0.73 0.67 0.59 0.75 110 0.73 0.69 0.65 0.61 0.56 0.65 106 0.64 0.61 0.59 0.56 0.54 0.55 102 0.55 0.54 0.53 0.52 0.51 0.50 100 0.50 0.50 0.50 0.50 0.50 Note: The cell entries are P(Y y c X=x c) where y c and x c are at the same percentiles of their distributions. Table 1. Probabilities of observing regression toward the mean as a function of percentile score on IQ test and correlation between first and second test scores. 4

Observing regression effects is even less likely for lower and more common scores. For example, when the percentile is 0.75 (IQ = 110), the probability of observing regression toward the mean is almost an even bet (0.56) when the X-Y correlation is 0.9. Overall, whereas all entries in Table 1 are greater than or equal to 0.50, they are only large (> 0.75) when percentiles are high (e.g., > 85%) and correlations are low (e.g., < 0.3). By definition high percentiles occur rarely and thus within a particular phenomenon people would often observe regression away from the mean as well as toward it. Hence, experience does not always lead to observing regression toward the mean. In theory, people should be able to learn about regression effects by making predictions and experiencing feedback. And indeed, when situations involve many judgments over time about the same process and feedback is clear and immediate, there is some practical awareness of regression (Hogarth & Soyer, 2011). For example, there is no expectation that the number one seed will always win major tennis tournaments. On the other hand, when the task structure is unclear or unfamiliar, people do experience difficulties (Kahneman & Tversky, 1973; Cox & Summers, 1987). In their seminal paper, Kahneman and Tversky (1973) proposed that people use a strategy that matches characteristics of the cue X with those of the target Y (the so-called representative heuristic). For example, if the value of X is high (say at the 90 th percentile), the value of Y is also predicted to be high (at the 90 th percentile). However, what are the consequences of this matching strategy that ignores regression effects? Consider two individuals. One uses the optimal regression strategy; the other uses a matching strategy. Over several cases, both make predictions and receive feedback. However, how does this feedback differ for the two individuals? Specifically, what is the correlation between the errors of the two strategies? To answer, consider the simple regression of Y on X where both Y and X have zero means and unit standard deviations and ε is an error term ~ N(0,σ ε). Y = βx + ε (1) The optimal prediction of Y for any X would be βx. Now consider a matching strategy where the prediction of Y is always made by reporting the value of X observed. Thus, comparing errors from the regression and matching strategies, we have Errors of regression model: (βx + ε) - βx = ε (2) and Errors of matching model: (βx + ε) x = βx X +ε = X(β 1) + ε (3) The correlation between the prediction errors of the two models can be written as: 5

which can be simplified to Cov[(x(β 1) +ε), ε] / ([σ x 2 (β 1) 2 + σ ε 2 ] 1/2 [σ ε 2 ] 1/2 ) σ ε / [(β 1) 2 + σ ε 2 ] 1/2. (4) The first column (1) of Table 2 reports the correlation between the prediction errors of the optimal (regression) and matching strategies for different levels of ρ X,Y, the correlation between X and Y. As can be seen, this is high across the range of ρ X,Y. Thus, if an individual were to relinquish matching and adopt the optimal strategy, there would be little difference in predictive accuracy. Feedback in the naturally occurring environment is unlikely to induce the use of more appropriate optimal regression strategies. Matching Mean 50/50 strategy strategy strategy ρ X,Y (1) (2) (3) 0.1 0.74 0.99 0.93 0.2 0.77 0.98 0.96 0.3 0.81 0.95 0.98 0.4 0.84 0.92 0.99 0.5 0.87 0.87 1.00 0.6 0.89 0.80 0.99 0.7 0.92 0.71 0.96 0.8 0.95 0.60 0.89 0.9 0.97 0.44 0.74 Table 2. Correlation of errors of different models with those of the optimal regression model across the possible range of uncertainty in the X-Y relation. Another naïve but probably unlikely strategy a person could use is to ignore the cue X and always predict using the mean of the target Y. How do the errors from this strategy correlate with those of the optimal model? The answer is provided in the second column (2) of Table 2. (See Appendix for derivation.) The mean strategy is quite effective. It is better than the matching strategy for low values of ρ X,Y but less effective for high values. Clearly, matching works well when ρ X,Y is high; and the mean strategy works well when ρ X,Y is low. To use the optimal model, one needs to know the value of ρ X,Y. However, in many if not most cases this is unknown. Thus, even if the decision maker knows that predictions should be 6

regressed, what should be done? A simple default strategy is to give equal weight to the cue X and the mean of the target Y thereby assuming ρ X,Y = 0.5. The consequences of this 50/50 strategy are presented in the 3 rd column of Table 2 (3) see Appendix for derivation. The strategy is, of course, optimal when ρ X,Y = 0.5 but it is also highly effective between ρ X,Y = 0.2 and ρ X,Y = 0.7. Indeed, a decision maker should always use the 50/50 strategy unless she knows that ρ X,Y is close to 0 (when the mean should be used) or close to 1 (when the matching strategy is superior). To illustrate this, we examined performance of the different strategies as measured by the mean absolute deviations (MADs) of their predictions. We simulated the underlying regression process of our IQ scenario assuming that both X and Y are normally distributed with means of 100 and standard deviations of 15 and varied the correlation between X and Y from 0.1 to 0.9. The results of our simulation based on 1,000,000 trials per case are reported in Table 3. With the exception of the mean strategy (where MADs are constant across different values of ρ X,Y), MADs decrease as the task becomes more predictable. For example, at ρ X,Y = 0.1, the MAD for the optimal regression strategy is 11.91 but this is 5.22 when ρ X,Y = 0.9. Of particular interest is how the non-optimal strategies compare with the optimal. To illustrate this we express (in the right panel of Table 3) the ratios of the MADs of the three non-optimal strategies that don t require knowledge of ρ X,Y to the MADs of the optimal regression strategy. Results show the robustness of the 50/50 strategy which is close to optimal (ratio of 1.00) over the mid-range of values of ρ X,Y and reinforces comparisons made by examining correlations between errors of models. In summary, the feedback that people receive after making predictions does not provide clear signals about the nature of regressive phenomena. In addition, judgmental strategies that ignore regression can often approximate the outcomes of optimal strategies and thereby seem satisfactory. Learning about regression through experience in the natural environment is difficult. Nonetheless, a simple solution can capture the benefits of accounting for regression effects. A 50/50 strategy that gives equal weight to the cue (X) and the mean of the target (Y) has close to optimal performance across a wide range of levels of correlation between cue and target. 7

Relative to Strategies: ρ X,Y Optimal Matching Mean 50/50 optimal: Matching Mean 50/50 0.1 11.91 16.05 11.97 12.83 1.35 1.01 1.08 0.2 11.72 15.13 11.97 12.26 1.29 1.02 1.05 0.3 11.43 14.16 11.98 11,67 1.24 1.05 1.02 0.4 10.98 13.12 11.97 11.04 1.19 1.09 1.01 0.5 10.36 11.96 11.97 10.36 1.15 1.16 1.00 0.6 9.58 10.72 11.97 9.66 1.12 1.25 1.01 0.7 8.55 9.27 11.96 8.87 1.08 1.40 1.04 0.8 7.18 7.57 11.98 8.03 1.05 1.67 1.12 0.9 5.22 5.35 11.96 7.08 1.02 2.29 1.36 Table 3. Left panel: Mean absolute deviations of different strategies (columns) as a function of correlations between X and Y (rows), based on simulations involving 1,000,000 trials. Right panel: Mean absolute deviations for different strategies relative to optimal. The lowest values in each row are highlighted (bold). Diagnostic tasks In diagnostic or backward inference the focus is on interpreting the occurrence of the target Y in light of the value of the cue X and any other factors. The possibility of regression effects is typically crowded out by a drive for causal explanations. Moreover, it is likely to be greater the greater the discrepancy or gap between the values of X and Y observed (Mackie, 1974). There may in fact be several compelling but spurious causal candidates available in any given situation and people typically have little difficulty in generating good stories (Kahneman & Tversky, 1973; Kahneman, 2011). What is critical is whether such interpretations are challenged by experience. In many cases, this does not occur, often because the individual does not seek disconfirmation but also because this may not be possible. The learning environment is wicked in the sense that erroneous causal explanations are necessarily consistent with past observations and then may persist if beliefs go unchallenged (Einhorn & Hogarth, 1978). What the decision maker lacks is a readily available non-causal counterfactual that can be used to test the hypothesis of regression effects. Specifically, if the value of the target Y were unknown, what would have been the best prediction given the cue X (i.e., consistent with regression effects)? Fortunately, this can be easily calculated using the 50/50 model outlined above that is, the decision maker can construct a non-causal counterfactual using an equally weighted composite of the value of the cue X and the mean of the target Y, without needing to know the true value of ρ X,Y. For example, consider our IQ scenario and imagine that a person has scored 115 on the second 8

test after scoring 135 on the first. Since 135 is an excellent result, there is an immediate desire to explain the relatively poor outcome on the second test. However, using the 50/50 rule one can quickly construct a non-causal counterfactual of 118 (i.e., 50% of 135 plus 50% of 100) with which to calibrate 115. And indeed, by this analysis a score of 115 would probably not merit a specific, causal explanation. Conclusion We identify and discuss three experiential reasons why decision makers fail to correctly intuit regression toward the mean. First, for any particular phenomenon, it is difficult to identify regression effects from experience unless confronted by a combination of extreme observations and noisy processes. But, by definition, this can only occur rarely within a phenomenon. In all other cases, the probability of failing to observe regression toward the mean is significant. Second, if people use matching strategies (based only on the cue X), the errors they observe will be highly correlated with those of optimal strategies that account for regression effects. Thus, it is not clear that outcome feedback will induce people to shift from using matching strategies. Third, spurious causal explanations can survive because they are not challenged by disconfirming feedback. We distinguish between predictive and diagnostic inference tasks and suggest how to account for regression effects. For the former, we identify a simple 50/50 rule that gives equal weight to the cue X and the mean of the target Y. This has close to optimal predictive performance across the range of levels of cue-target correlations. For the latter, we note the need for a non-causal counterfactual that can be compared to the target outcome Y. Fortunately, this can be provided by the same 50/50 rule that is used to calculate the target Y that would have been expected based on the cue X. In other words, the 50/50 rule can be used in both prediction and diagnosis. Finally, people should clearly be made aware of the possibility and nature of regression effects. However, they also need simple, practical mechanisms for dealing with these. The 50/50 rule provides an effective remedy for this need. 9

References Bland, J. M., & Altman, D. G. (1994). Statistics notes: Some examples of regression towards the mean. British Medical Journal, 309(6957), 780. Cox, A. D., & Summers, J. O. (1987). Heuristics and biases in the intuitive prediction of retail sales. Journal of Marketing Research, 24 (3), 290-297. Denrell, J., Fang, C., & Liu, C. (2014). Perspective : Chance explanations in the management sciences. Organization Science, 26(3), 923-940. Einhorn, H. J., & Hogarth, R. M. (1978). Confidence in judgment: Persistence of the illusion of validity. Psychological Review, 85, 395-416. Fiedler, K., & Unkelbach, C. (2014). Regressive judgment: Implications of a universal property of the empirical world. Current Directions in Psychological Science, 23, 361-367. Gilovich, T. (2008). How we know what isn't so. New York, NY: Simon and Schuster. Hogarth, R. M. (2001). Educating intuition. Chicago, IL: University of Chicago Press. Hogarth, R. M., & Soyer, E. (2011). Sequentially simulated outcomes: Kind experience vs. non-transparent description. Journal of Experimental Psychology: General, 140, 434-463. Hogarth, R. M., Lejarraga, T., & Soyer, E. (2015). The two settings of kind and wicked learning environments. Current Directions in Psychological Science, 24, 379-385. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Strauss and Giroux. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237-251. Mackie, J. L. (1974). The cement of the universe: A study of causation. Oxford, UK: Clarendon Press. McDonald, C. J., Mazzuca, S. A., & McCabe, G. P. (1983). How much of the placebo effect is really statistical regression? Statistics in Medicine, 2(4), 417-427. Morton, V., & Torgerson, D. J. (2003). Effect of regression to the mean on decision making in health care. British Medical Journal, 326(7398), 1083. Stigler, S.M. (1997). Regression toward the mean, historically considered. Methods in Medical Research, 6, 103-114. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131. 10

Appendix Here we derive the correlations of the errors of the mean and 50/50 models with those of the optimal (regression) model. Following the derivation of the correlation for the matching model in the main text, consider the simple regression of Y on X where both Y and X have zero means and unit standard deviations and ε is an error term ~ N(0,σ ε). That is, Y = βx + ε (1) From this equation, it is clear that the optimal prediction of Y for any X is βx. Mean strategy Errors of regression model: (βx + ε) - βx = ε (2) Errors of the mean model: (βx + ε) 0 = (βx + ε) (3) The correlation of the errors of the mean model with those of the optimal model is Cov[(βX + ε), ε]/([β 2 σ 2 x + σ 2 ε ] 1/2 [σ 2 ε ] 1/2 ) = σ ε 2 /[(β 2 +(1-ρ 2 yx)] 1/2 σ ε (4) = σ ε/( ρ 2 yx+1- ρ 2 yx) 1/2 = σ ε/1 1/2 50/50 model = σ ε = (1 ρ 2 yx) 1/2 (5) The errors from this model can be expressed: (βx + ε) 0.5X = (β-0.5)x + ε (6) The correlation of the errors of the 50/50 model with those of the optimal model is Cov[((β 0.5)X + ε)), ε]/([(β 0.5) 2 σ x 2 + σ ε 2 ] 1/2 [σ ε 2 ] 1/2 ) (7) = σ ε 2 / [(β 2 β + 0.25 + σ ε 2 ] 1/2 σ ε = σ ε / ((ρ 2 yx ρ yx +0.25 +(1-ρ 2 yx)) 1/2 = σ ε / (1.25 ρ yx ) 1/2 = (1-ρ 2 yx) 1/2 /(1.25 ρ yx ) 1/2 (8) 11