Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Similar documents
Frequency distributions

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Statistics is a broad mathematical discipline dealing with

Estimation. Preliminary: the Normal distribution

Introduction to Statistical Data Analysis I

Observational studies; descriptive statistics

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

2.4.1 STA-O Assessment 2

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Lesson 9 Presentation and Display of Quantitative Data

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

STATISTICS AND RESEARCH DESIGN

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Business Statistics Probability

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Chapter 1: Exploring Data

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Unit 1 Exploring and Understanding Data

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

NORTH SOUTH UNIVERSITY TUTORIAL 1

Types of data and how they can be analysed

Chapter 1: Introduction to Statistics

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Descriptive Statistics Lecture

Undertaking statistical analysis of

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

CCM6+7+ Unit 12 Data Collection and Analysis

Chapter 1. Picturing Distributions with Graphs

ANOVA in SPSS (Practical)

V. Gathering and Exploring Data

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Still important ideas

Probability and Statistics. Chapter 1

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov

How to interpret scientific & statistical graphs

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

4.3 Measures of Variation

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

STP226 Brief Class Notes Instructor: Ela Jackiewicz

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Methodological skills

Understandable Statistics

Still important ideas

Empirical Rule ( rule) applies ONLY to Normal Distribution (modeled by so called bell curve)

Lesson 1: Distributions and Their Shapes

On the purpose of testing:

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Chapter 2--Norms and Basic Statistics for Testing

The normal curve and standardisation. Percentiles, z-scores

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Quantitative Data and Measurement. POLI 205 Doing Research in Politics. Fall 2015

Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3 - Numerical Descriptive Measures Solutions

Statistics: Making Sense of the Numbers

PRINCIPLES OF STATISTICS

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Chapter 7: Descriptive Statistics

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Lesson 1: Distributions and Their Shapes

Content Scope & Sequence

Welcome to OSA Training Statistics Part II

1) What is the independent variable? What is our Dependent Variable?

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Appendix B Statistical Methods

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

This is Descriptive Statistics, chapter 12 from the book Psychology Research Methods: Core Skills and Concepts (index.html) (v. 1.0).

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

PubHlth Introductory Biostatistics Fall 2013 Examination 1 - REQURED Due Monday September 30, 2013

AP Statistics. Semester One Review Part 1 Chapters 1-5

Averages and Variation

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Assessing Agreement Between Methods Of Clinical Measurement

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

First Hourly Quiz. SW 430: Research Methods in Social Work I

3: Summary Statistics

MCQ and EMI Self Test

VU Biostatistics and Experimental Design PLA.216

HW 1 - Bus Stat. Student:

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Transcription:

Clinical Biostatistics Data, frequencies, and distributions Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Types of data Qualitative data arise when individuals may fall into separate classes. E.g. diagnosis, alive/dead. A qualitative variable is also termed a categorical variable or an attribute. Quantitative data are numerical, arising from counts or measurements. If the values of the measurements are integers (whole numbers) those data are said to be discrete. E.g. family size. If the values of the measurements can take any number in a range, such as height or weight, the data are said to be continuous. E.g. blood pressure, serum cholesterol. Types of data Variables are qualities or quantities which vary from one member of a sample to another. A statistic is anything calculated from the data alone. 1

distributions Source of referral of patients in a physiotherapy trial (Frost et al., 24) Source of referral: Relative frequency General practitioner 26 89.8% Consultant 18 6.3% Triage * 1 3.% Sports centre 1.4% Total 28 1.% Source of referral is a qualitative variable. Frost H, Lamb SE, Doll HA, Carver PT, Stewart-Brown S. (24) Randomised controlled trial of physiotherapy compared with advice for low back pain. British Medical Journal 329, 78-711. distributions Source of referral of patients in a physiotherapy trial (Frost et al., 24) Source of referral: Relative frequency General practitioner 26 89.8% Consultant 18 6.3% Triage * 1 3.% Sports centre 1.4% Total 28 1.% The count of individuals having a particular quality is called the frequency of that quality. The proportion of individuals having the quality is called the relative frequency or proportional frequency. The relative frequency of general practitioner referral is 26/28 =.898 or 89.8%. distributions Source of referral of patients in a physiotherapy trial (Frost et al., 24) Source of referral: Relative frequency General practitioner 26 89.8% Consultant 18 6.3% Triage * 1 3.% Sports centre 1.4% Total 28 1.% The count of individuals having a particular quality is called the frequency of that quality. The proportion of individuals having the quality is called the relative frequency or proportional frequency. The set of frequencies of all the possible categories is called the frequency distribution of the variable. 2

Ordered categories Mobility of patients recruited to the VenUS I trial (data of Nelson et al., 24). Mobility Cumulative Relative Cumulative relative frequency frequency frequency Walks freely 238 62.1% 238 62.1% Walks with 142 37.1% 38 99.2% difficulty Immobile 3.8% 383 1.% Total 383 1.% 383 1.% Nelson EA, Iglesias CP, Cullum N, Torgerson DJ. (24) Randomized clinical trial of four-layer and short-stretch compression bandages for venous leg ulcers (VenUS I). British Journal of Surgery 91, 1292-1299. Ordered categories Mobility of patients recruited to the VenUS I trial (data of Nelson et al., 24). Mobility Cumulative Relative Cumulative relative frequency frequency frequency Walks freely 238 62.1% 238 62.1% Walks with 142 37.1% 38 99.2% difficulty Immobile 3.8% 383 1.% Total 383 1.% 383 1.% The cumulative frequency for a value of a variable is the number of individuals with values less than or equal to that value. The relative cumulative frequency for a value is the proportion of individuals in the sample with values less than or equal to that value. Discrete quantitative variable: Number of episodes of venous ulcers after first onset for patients recruited to the VenUS I trial Number Relative of Relative cumulative episodes frequency frequency 11 2.9 2.9 1 14 38.7 41.6 2 11 26.9 68. 3 39 1.4 78.9 4 23 6.1 8.1 14 3.7 88.8 6 9 2.4 91.2 7 4 1.1 92.3 8 6 1.6 93.9 9 1.3 94.1 1 9 2.4 96......... 3

Discrete quantitative variable: Number of episodes of venous ulcers after first onset for patients recruited to the VenUS I trial Number Relative of Relative cumulative episodes frequency frequency........ 13 1.3 96.8 1 1.3 97.1 17 1.3 97.3 2 3.8 98.1 26 1.3 98.4 29 1.3 98.7 4 1.3 98.9 3.8 99.7 64 1.3 1. Total 37 1. 1. Discrete quantitative variable: Number of episodes of venous ulcers after first onset for patients recruited to the VenUS I trial Number Relative of Relative cumulative episodes frequency frequency 11 2.9 2.9 1 14 38.7 41.6 2 11 26.9 68. 3 39 1.4 78.9 4 23 6.1 8.1 14 3.7 88.8 6 9 2.4 91.2........ We can count the number of times each possible value occurs to get the frequency distribution. Continuous variable: measured on a sample of 86 stroke patients (data of Markus et al., 199) 3.7 4.8.4.6 6.1 6.4 7. 7.6 8.7 3.8 4.9.4.6 6.1 6. 7. 7.6 8.9 3.8 4.9..7 6.1 6. 7.1 7.6 9.3 4.4 4.9..7 6.2 6.6 7.1 7.7 9. 4....7 6.3 6.7 7.2 7.8 1.2 4..1.6.8 6.3 6.7 7.3 7.8 1.4 4..1.6.8 6.4 6.8 7.4 7.8 4.7.2.6.9 6.4 6.8 7.4 8.2 4.7.3.6 6. 6.4 7. 7. 8.3 4.8.3.6 6.1 6.4 7. 7. 8.6 Markus HS, Barley J, Lunt R, Bland JM, Jeffery S, Carter ND, Brown MM. (199) Angiotensin-converting enzyme gene deletion polymorphism: a new risk factor for lacunar stroke but not carotid atheroma. Stroke 26, 1329-33. 4

Continuous variable: measured on a sample of 86 stroke patients (data of Markus et al., 199) 3.7 4.8.4.6 6.1 6.4 7. 7.6 8.7 3.8 4.9.4.6 6.1 6. 7. 7.6 8.9 3.8 4.9..7 6.1 6. 7.1 7.6 9.3 4.4 4.9..7 6.2 6.6 7.1 7.7 9. 4....7 6.3 6.7 7.2 7.8 1.2 4..1.6.8 6.3 6.7 7.3 7.8 1.4 4..1.6.8 6.4 6.8 7.4 7.8 4.7.2.6.9 6.4 6.8 7.4 8.2 4.7.3.6 6. 6.4 7. 7. 8.3 4.8.3.6 6.1 6.4 7. 7. 8.6 As most of the values occur only once, counting the number of occurrences does not help. Continuous variable: measured on a sample of 86 stroke patients (data of Markus et al., 199) 3.7 4.8.4.6 6.1 6.4 7. 7.6 8.7 3.8 4.9.4.6 6.1 6. 7. 7.6 8.9 3.8 4.9..7 6.1 6. 7.1 7.6 9.3 4.4 4.9..7 6.2 6.6 7.1 7.7 9. 4....7 6.3 6.7 7.2 7.8 1.2 4..1.6.8 6.3 6.7 7.3 7.8 1.4 4..1.6.8 6.4 6.8 7.4 7.8 4.7.2.6.9 6.4 6.8 7.4 8.2 4.7.3.6 6. 6.4 7. 7. 8.3 4.8.3.6 6.1 6.4 7. 7. 8.6 Divide the serum cholesterol scale into class intervals, e.g. from 3. to 4., from 4. to., and so on. Count the number of individuals with serum cholesterols in each class interval. Continuous variable: The class intervals should not overlap, so we must decide which interval contains the boundary point to avoid it being counted twice. It is usual to put the lower boundary of an interval into that interval and the higher boundary into the next interval. Thus the interval starting at 3. and ending at 4. contains 3. but not 4.. We can write this as 3. or 3. 4. - or 3. 3.999.

Continuous variable: 3.7 4.8.4.6 6.1 6.4 7. 7.6 8.7 3.8 4.9.4.6 6.1 6. 7. 7.6 8.9 3.8 4.9..7 6.1 6. 7.1 7.6 9.3 4.4 4.9..7 6.2 6.6 7.1 7.7 9. 4....7 6.3 6.7 7.2 7.8 1.2 4..1.6.8 6.3 6.7 7.3 7.8 1.4 4..1.6.8 6.4 6.8 7.4 7.8 4.7.2.6.9 6.4 6.8 7.4 8.2 4.7.3.6 6. 6.4 7. 7. 8.3 4.8.3.6 6.1 6.4 7. 7. 8.6 Cholesterol Cholesterol 3. - 3 7. - 19 4. - 11 8. -. - 24 9. - 2 6. - 2 1. - 2 ------------------------------------------------- Total 86 Continuous variable: distribution of serum cholesterol (mmol/l) Relative Cholesterol frequency 3. - 3.3 4. - 11.128. - 24.279 6. - 2.233 7. - 19.221 8. -.8 9. - 2.23 1. - 2.23 ------------------------------- Total 86 1. Depends on choice of interval width. Shape is the important thing. Graphical presentation. Histograms and other frequency graphs The most common way of depicting a frequency distribution is by a histogram. A diagram where the class intervals are on an axis and rectangles with heights or areas proportional to the frequencies erected on them. 2 2 1 1 2 3 4 6 7 8 9 1 11 Relative frequency.3.2.1 2 3 4 6 7 8 9 1 11 6

Histograms of cholesterol: frequency scale 2 2 2 1 1 1 1 2 3 4 6 7 8 9 1 11 2 3 4 6 7 8 9 1 11 Different starting points and interval width, same shape. Histograms of cholesterol: frequency density scale 2 1 1 2 3 4 6 7 8 9 1 11 density 4 3 3 2 2 1 1 2 3 4 6 7 8 9 1 11 In this case it the area under the histogram which represents the frequency. For 3.7 to 4.2- mmol/l, frequency density = 4 obs per mmol/l. Width of the interval =., frequency = 4. = 2. Histograms of cholesterol: frequency density and relative frequency density scales density 4 3 3 2 2 1 1 2 3 4 6 7 8 9 1 11 Relative frequency density..4.3.2.1 2 3 4 6 7 8 9 1 11 If we plot the relative frequency density, the proportion of observations per unit of the variable, the total area under the histogram is 1.. 7

density enables us to smooth histograms. On the frequency scale, combining intervals gives a misleading impression. 4 3 2 1 1 1 2 2 Systolic blood pressure (mm Hg) 4 3 2 1 1 1 2 2 Systolic blood pressure (mm Hg) density 2 1. 1. 1 1 2 2 Systolic blood pressure (mm Hg) For a discrete variable we can separate the bars: 1 1 1 2 3 4 6 Episodes since first onset of ulcer This emphasises the discreteness. polygon: join the tops of the bars in the histogram. Relative frequency 1 1 2 2 2 3 4 6 7 8 9 1 11 12 Good for showing more than one distribution on the same axes. Relative frequency.1.2.3.4 2 3 4 6 7 8 9 1 11 12 Stroke cases Controls 8

The mode The most frequently occurring value is called the mode of the distribution. The outer areas are the tails. Unimodal distributions have one mode. 2 2 1 1 Lower tail Mode Upper tail 2 3 4 6 7 8 9 1 11 The mode The most frequently occurring value is called the mode of the distribution. The outer areas are the tails. Unimodal distributions have one mode. 1 1 1 2 3 4 6 Episodes since first onset of ulcer The mode The most frequently occurring value is called the mode of the distribution. The outer areas are the tails. Bimodal distributions have two modes. 6 4 2 1 1 2 2 Systolic blood pressure (mm Hg) Systolic blood pressure in 21 patients admitted to an intensive therapy unit. There are two populations. 9

The parts of the histogram near the extremes are called the tails of the distribution. If the tail on the right is of similar length to the tail on the left, the distribution is symmetrical: 8 6 4 2 Heights of 222 women admitted to the VenUS I trial. 13 14 1 16 17 18 19 Height (cm) The parts of the histogram near the extremes are called the tails of the distribution. If the tail on the right is longer than the tail on the left, the distribution is skew to the right or positively skew: 2 2 1 1 2 3 4 6 7 8 9 1 11 1 1 1 2 3 4 6 Episodes since first onset of ulcer The parts of the histogram near the extremes are called the tails of the distribution. If the tail on the right is longer than the tail on the left, the distribution is skew to the left or negatively skew: 4 3 2 1 2 2 3 3 4 4 Gestational age (weeks) Gestational age at birth. 1

Most medical data have unimodal distributions. Most medical data follow either a symmetrical or positively skew distribution. Medians and quantiles The quantiles are values which divide the distribution such that there is a given proportion of observations below the quantile. The median is the central value of the distribution, such that half the points are less than or equal to it and half are greater than or equal to it. For the cholesterol data the median is 6.1, midway between the 43 rd and 44th of the 86 observations. If we have an odd number of points, the central value is an actual observation, if we have an even number of points, we choose a value midway between the two central values. Medians and quantiles The three quartiles divide the distribution into four equal parts. The second quartile is the median. The first quartile has 2% of observations below it, the third quartile has 2% of observations above it. 2 2 1 1 first third quartile quartile median 2 3 4 6 7 8 9 1 11 Serum cholestrol (mmol/l) Note that the quartile is the dividing point, not the area below it. We should call this a quarter. You will often see this misuse of the term. 11

Medians and quantiles We often divide the distribution into 1 centiles or percentiles. The median is thus the th centile. The mean The arithmetic mean or average, usually referred to simply as the mean is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x. The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness: Mean cholesterol = 6.34, median cholesterol = 6.1. Mean height = 162.2, median height = 162.6. Mean ulcer episodes = 3.4, median episodes = 2. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will usually be different. If the distribution is skew to the right, as for serum cholesterol, the mean will usually be greater, if it is skew to the left the median will usually be greater. This is because the values in the tails affect the mean but not the median. 12

2 2 1 1 mean median 2 3 4 6 7 8 9 1 11 Serum cholestrol (mmol/l) Increasing the largest observation will pull the mean higher. It will not affect the median. Variability The mean and median are measures of the central tendency or position of the middle of the distribution. We shall also need a measure of the spread, dispersion or variability of the distribution. The range is the difference between the highest and lowest values. This is a useful descriptive measure, but has two disadvantages. Firstly, it depends only on the extreme values and so can vary a lot from sample to sample. Secondly, it depends on the sample size. The larger the sample is, the further apart the extremes are likely to be. Variability The range depends on the sample size. The larger the sample is, the further apart the extremes are likely to be. We can get round this problem by using the interquartile range or IQR, the difference between the first and third quartiles, a useful descriptive measure. 13

Variability For use in the analysis of data, range and IQR are not satisfactory. Instead we use two other measures of variability: variance and standard deviation. These both measure how far observations are from the mean of the distribution. Variance is the average squared difference from the mean. Standard deviation is the square root of the variance. Variance Variance is an average squared difference from the mean. Note that if we have only one observation, we cannot do this. The mean is the observation and the difference is zero. We need at least two observations. The sum of the squared differences from the mean is proportional to the number of observations minus one, called the degrees of freedom. Variance is estimated as the sum of the squared differences from the mean divided by the degrees of freedom. Variance Height: variance = 49.7 cm 2 Cholesterol: variance = 1.96 mmol/l 2. Episodes of ulceration: variance = 42.3 episodes 2 Gestational age: variance =.24 weeks 2 Variance is based on the squares of the observations and so is in squared units. This makes it difficult to interpret. 14

Standard deviation The variance is calculated from the squares of the observations. This means that it is not in the same units as the observations. We take the square root, which will then have the same units as the observations and the mean. The square root of the variance is called the standard deviation, usually denoted by s. Height: s = 49.7 = 7.1 cm. Cholesterol: s = 1.96 = 1.4 mmol/l. Episodes of ulceration: s = 42.3 = 6. episodes. Standard deviation Height: s = 49.7 = 7.1 cm. frequency 8 6 4 2 mean-2s mean-s mean mean+2s mean+s 13 14 1 16 17 18 19 Height (cm) Majority of observations within one SD of mean (usually 2/3 or more). Almost all within about two SD of mean (usually about 9%). Standard deviation Cholesterol: s = 1.96 = 1.4 mmol/l. 2 2 1 1 mean-2s mean-s mean mean+2s mean+s 2 3 4 6 7 8 9 1 11 Serum cholestrol (mmol/l) Majority of observations within one SD of mean (usually 2/3 or more). Almost all within about two SD of mean (usually about 9%), but those outside may be all at one end. 1

Standard deviation Duration of venous ulcer: s = 189.3 = 13.8 months. 2 1 1 mean-2smeanmean+2s mean-smean+s 12 24 36 48 6 72 Duration of ulcer (months) Majority of observations within one SD of mean (usually 2/3 or more). Almost all within about two SD of mean (usually about 9%), but those outside may be all at one end. Standard deviation Gestational age: s =. 242 = 2.29 weeks. 4 3 2 1 m-2smean m-s m+s m+2s 2 2 3 3 4 4 Gestational age (weeks) Majority of observations within one SD of mean (usually 2/3 or more). Almost all within about two SD of mean (usually about 9%), but those outside may be all at one end. Spotting skewness If the mean is less than two standard deviations, two standard deviations below the mean will be negative. For any variable which cannot be negative, this tells us that the distribution must be positively skew. If the mean or the median is near to one end of the range or interquartile range, this tells us that the distribution must be skew. If the mean or median is near the lower limit it will be positively skew, if near the upper limit it will be negatively skew. 16

Spotting skewness Duration of ulcer: median = 3., mean = 9.4, SD = 14., range = to 7, IQR = 1 to 1 months. These rules of thumb only work one way, e.g. mean may exceed two SD and distribution may still be skew. Gestational age: median = 39, mean = 38.9, SD = 2.29, range = 21 to 44, IQR = 38 to 4 weeks. 17