Economics 471 Lecture 1. Regression to Mediocrity: Galton s Study of the Inheritance of Height

Similar documents
Do Old Fallacies Ever Die? by Milton Friedman Journal of Economic Literature 30, December 1992, pp American Economic Association

Amanda Wachsmuth, Leland Wilkinson, Gerard E. Dallal. January 7, 2003

Lesson 9 Presentation and Display of Quantitative Data

Problem Set 3 ECN Econometrics Professor Oscar Jorda. Name. ESSAY. Write your answer in the space provided.

Chapter 3 CORRELATION AND REGRESSION

MA 250 Probability and Statistics. Nazar Khan PUCIT Lecture 7

INTERPRET SCATTERPLOTS

Chapter 3: Examining Relationships

IAPT: Regression. Regression analyses

Business Statistics Probability

3.2 Least- Squares Regression

Math 124: Module 2, Part II

Unit 1 Exploring and Understanding Data

Chapter 4. Navigating. Analysis. Data. through. Exploring Bivariate Data. Navigations Series. Grades 6 8. Important Mathematical Ideas.

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Your DNA extractions! 10 kb

bivariate analysis: The statistical analysis of the relationship between two variables.

Statistical Methods and Reasoning for the Clinical Sciences

6. Unusual and Influential Data

Chapter 7: Descriptive Statistics

NORTH SOUTH UNIVERSITY TUTORIAL 2

CHAPTER 3 Describing Relationships

MEA DISCUSSION PAPERS

Math 1680 Class Notes. Chapters: 1, 2, 3, 4, 5, 6

Underestimating Correlation from Scatterplots

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Transmuting Women into Men: Galton s Family Data on Human Stature

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

10. LINEAR REGRESSION AND CORRELATION

CCM6+7+ Unit 12 Data Collection and Analysis

Introduction to regression

STATISTICS AND RESEARCH DESIGN

Lesson 1: Distributions and Their Shapes

(a) 50% of the shows have a rating greater than: impossible to tell

Using Perceptual Grouping for Object Group Selection

Appendix B Statistical Methods

The U-Shape without Controls. David G. Blanchflower Dartmouth College, USA University of Stirling, UK.

Unit 7 Comparisons and Relationships

Still important ideas

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

(a) 50% of the shows have a rating greater than: impossible to tell

Introduction. Chapter Before you start Formulation

An Introduction to Bayesian Statistics

AP Statistics. Semester One Review Part 1 Chapters 1-5

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Section 3.2 Least-Squares Regression

Behavioral generalization

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

STATISTICS INFORMED DECISIONS USING DATA

The Context of Detection and Attribution

Understandable Statistics

AP STATISTICS 2010 SCORING GUIDELINES

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

Ordinary Least Squares Regression

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

A Memory Model for Decision Processes in Pigeons

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

How to interpret scientific & statistical graphs

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

8.SP.1 Hand span and height

12.1 Inference for Linear Regression. Introduction

Political Science 15, Winter 2014 Final Review

STAT445 Midterm Project1

An Introduction to Quantitative Genetics

UNIT 1CP LAB 1 - Spaghetti Bridge

Unit 8 Day 1 Correlation Coefficients.notebook January 02, 2018

CHAPTER ONE CORRELATION

The Regression-Discontinuity Design

SUPPLEMENTAL MATERIAL

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Introduction to Statistical Data Analysis I

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

AP Statistics Practice Test Ch. 3 and Previous

Psychologist use statistics for 2 things

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Lesson 11 Correlations

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Quantitative Literacy: Thinking Between the Lines

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Measuring the User Experience

Comparing Categorization Models

Introduction to Quantitative Genetics

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Chapter 1: Exploring Data

Analysis and Interpretation of Data Part 1

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Transcription:

University of Illinois Spring 2006 Department of Economics Roger Koenker Economics 471 Lecture 1 Regression to Mediocrity: Galton s Study of the Inheritance of Height Arguably, the most important statistical graphic ever produced is Francis Galton s (1885) figure illustrating regression to the mean, reproduced badly below as Figure 1. In it Galton plots childrens height versus parents height for a sample of 928 children. He begins by dividing the plane into one inch squares and entering the frequency counts for each square. The resulting histogram appeared too rough so he smoothed the plot by averaging the counts within each group of four adjacent squares and plotting the averaged count at the intersection of the cell boundaries. Not content to invent regression in one plot, he managed to invent bivariate kernel density estimation, too! After smoothing, the counts appeared more regular and he enlisted the help of the Cambridge mathematician, J.H. Dickson, to draw elliptical contours corresponding to level curves of the underlying population density. Now suppose we wished to predict children s height based on parental height, say the average height of the parents which we will call, following Galton, the height of the midparent. what would we do? One approach, given the graphical apparatus at hand would be to find the most likely value of the child s height given the parents height, that is, for any given value of the mid-parent height we could ask, what value of the child s height puts us on the highest possible contour of the joint density. This obviously yields a locus of tangencies of the ellipses with horizontal lines in the figure. These conditional modes, given the joint normality implicit in the elliptical contours, are also the conditional medians and means. The slope of the line describing this locus of tangencies is roughly 2/3 so a child with midparent 3 inches taller than average can expected to be (will most probably be) only 2 inches taller than average. 1

2 Figure 1. Galton s (1889) Regression to the Mean Plot Galton termed this regression towards mediocrity, and paraphrasing Abraham Lincoln we might strengthen this to regression of the mediocre, to the mediocre, and for the mediocre. Children are more mediocre than their parents: they are tend to be on average closer to the mean height, the mean weight, the mean intelligence of the population than were their parents. In the case of height we have seen that children of parents who are one inch taller than the general population tend on average to be only about two-thirds of an inch taller than the population. But before you despair, I should hasten to point out, as Galton did, that parents are also more mediocre than their children if we run the usual conventions of temporal causality backward and ask: how do the heights of parents compare to the heights of their children we find (looking at the figure) that children who are unusually tall have parents that are closer than they to the mean height of the population, and children who are unusually short also have parents

closer than they are to the mean. Stigler (1997) provides a fascinating guide to Galton s own thinking about this idea, and to the illusive nature of its reception in subsequent statistical research. It is a remarkable feature of the conditional densities of jointly Gaussian random variables that the conditioning induces what we may call a pure location shift. In Galton s original example the height of the midparent alters only the location of the center of the conditional density of the child s height; dispersion and shape of the conditional density is invariant to the height of the midparent. This is, of course, the essential feature of the classical linear regression model the entire effect of the covariates on the response is captured by the location shift 3 E(Y X = x) = x β while the remaining randomness of Y given X may be modeled as an additive error independent of X. Just to confirm that this empirical regularity hasn t been repealed over the intervening 100 years I illustrate in Figure 2 a similar plot for a sample of Finnish boys. The results are quite similar to those obtained by Galton. What does this have to do with econometrics? To answer this question we need to step forward in time about 50 years and consider a book published in 1933 by Horace Secrist. Secrist was a professor of economics at Northwestern University trained at Chicago and an expert in what we would now refer to as Industrial Organization. The book was titled The Triumph of Mediocrity in Business and had occupied 10 years of this research. In it Secrist showed in excrutiating detail if you grouped firms into performance categories in some initial year, and then followed them in subsequent years, that the initially most successful tended to do worse over time, while the least successful tended to improve. Secrist s conclusion was that American business was converging to mediocrity. We will come back to this example later in the course. References

4 midparent height in cm child s height in cm 140 150 160 170 180 190 200 140 150 160 170 180 190 200 Boys Figure 2. A Modern Galton Inheritance-of-Height Plot: The plot depicts the heights of 236 Finnish boys at age 17 versus the mean height of their parents. The two ellipses represent 50 and 90 percent confidence regions estimated for the pairs of points based upon the conventional bivariate Gaussian model for the data. The dotted lines depict the major and minor axes of the ellipses. The solid line represents the least squares fit; that is, it is the line that minimizes the sum of squared vertical distances from the points to the line. The slope of the line is β =.76, somewhat larger than the slope of 2 3 obtained by Galton. What is the dashed line? Stigler, S. (1996) The History of Statistics in 1933, Statistical Science, 11, 244-252. Friedman, M. (1992) Do Old Fallacies Ever Die? J. of Econonomic Literature, 30, 2129-2132. Hotelling, H. (1933) Review of Secrist, JASA, 28, 463-4.

5 Hotelling, H. (1934) Response to Secrist, JASA, 29, 198-9. Secrist, H. (1933) The Triumph of Mediocrity, Northwestern U. Press.