Statistics Lecture 13 Sampling Distributions (Chapter 18) fe1. Definitions again

Similar documents
Statistics 11 Lecture 18 Sampling Distributions (Chapter 6-2, 6-3) 1. Definitions again

Objectives. Sampling Distributions. Overview. Learning Objectives. Statistical Inference. Distribution of Sample Mean. Central Limit Theorem

How is the President Doing? Sampling Distribution for the Mean. Now we move toward inference. Bush Approval Ratings, Week of July 7, 2003

Review for Chapter 9

Chapter 21. Recall from previous chapters: Statistical Thinking. Chapter What Is a Confidence Interval? Review: empirical rule

Lecture Outline. BIOST 514/517 Biostatistics I / Applied Biostatistics I. Paradigm of Statistics. Inferential Statistic.

Sec 7.6 Inferences & Conclusions From Data Central Limit Theorem

Concepts Module 7: Comparing Datasets and Comparing a Dataset with a Standard

Objectives. Types of Statistical Inference. Statistical Inference. Chapter 19 Confidence intervals: Estimating with confidence

Chapter 8 Descriptive Statistics

5/7/2014. Standard Error. The Sampling Distribution of the Sample Mean. Example: How Much Do Mean Sales Vary From Week to Week?

Statistics for Managers Using Microsoft Excel Chapter 7 Confidence Interval Estimation

CHAPTER 8 ANSWERS. Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley

23.3 Sampling Distributions

GOALS. Describing Data: Numerical Measures. Why a Numeric Approach? Concepts & Goals. Characteristics of the Mean. Graphic of the Arithmetic Mean

Sampling Distributions and Confidence Intervals

Estimation and Confidence Intervals

Measures of Spread: Standard Deviation

Practical Basics of Statistical Analysis

Chapter 8 Student Lecture Notes 8-1

Appendix C: Concepts in Statistics

A Supplement to Improved Likelihood Inferences for Weibull Regression Model by Yan Shen and Zhenlin Yang

Statistical Analysis and Graphing

Chapter 23 Summary Inferences about Means

Sample Size Determination

EDEXCEL NATIONAL CERTIFICATE UNIT 28 FURTHER MATHEMATICS FOR TECHNICIANS OUTCOME 1- ALGEBRAIC TECHNIQUES TUTORIAL 3 - STATISTICAL TECHNIQUES

Standard deviation The formula for the best estimate of the population standard deviation from a sample is:

Technical Assistance Document Algebra I Standard of Learning A.9

Methodology National Sports Survey SUMMARY

Bayesian Sequential Estimation of Proportion of Orthopedic Surgery of Type 2 Diabetic Patients Among Different Age Groups A Case Study of Government

Intro to Scientific Analysis (BIO 100) THE t-test. Plant Height (m)

Measuring Dispersion

Estimating Means with Confidence

Chapter - 8 BLOOD PRESSURE CONTROL AND DYSLIPIDAEMIA IN PATIENTS ON DIALYSIS

Chapter 18 - Inference about Means

Chapter 7 - Hypothesis Tests Applied to Means

EFSA Guidance for BMD analysis Fitting Models & Goodness of Fit

Chapter 7 - Hypothesis Tests Applied to Means

Caribbean Examinations Council Secondary Education Certificate School Based Assessment Additional Math Project

DISTRIBUTION AND PROPERTIES OF SPERMATOZOA IN DIFFERENT FRACTIONS OF SPLIT EJACULATES*

Estimation Of Population Total Using Model-Based Approach: A Case Of HIV/AIDS In Nakuru Central District, Kenya

Introduction. The Journal of Nutrition Methodology and Mathematical Modeling

JUST THE MATHS UNIT NUMBER STATISTICS 3 (Measures of dispersion (or scatter)) A.J.Hobson

Lecture 4: Distribution of the Mean of Random Variables

Reporting Checklist for Nature Neuroscience

(4) n + 1. n+1. (1) 2n 1 (2) 2 (3) n 1 2 (1) 1 (2) 3 (1) 23 (2) 25 (3) 27 (4) 30

Repeatability of the Glaucoma Hemifield Test in Automated Perimetry

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods:

The Nutritional Density Ratio Dilemma: Developing a Scale for Nutritional Value Paul D. Q. Campbell

Confidence Intervals and Point Estimation

STATISTICAL ANALYSIS & ASTHMATIC PATIENTS IN SULAIMANIYAH GOVERNORATE IN THE TUBER-CLOSES CENTER

5.1 Description of characteristics of population Bivariate analysis Stratified analysis

GSK Medicine Study Number: Title: Rationale: Study Period: Objectives: Primary Secondary Indication: Study Investigators/Centers: Research Methods

Methodology CHAPTER OUTLINE

Should We Care How Long to Publish? Investigating the Correlation between Publishing Delay and Journal Impact Factor 1

The Effect of Question Order on Reporting Physical Activity and Walking Behavior

ANALYZING ECOLOGICAL DATA

S3: Ultrasensitization is Preserved for Transient Stimuli

Maximum Likelihood Estimation of Dietary Intake Distributions

Study No.: Title: Rationale: Phase: Study Period: Study Design: Centres: Indication: Treatment: Objectives: Primary Outcome/Efficacy Variable:

Whether you have a bacterial infection or a viral infection, there are things you can do to help yourself feel better:

MEDICAL HOME: Inside: Feeling Blue about the Holidays? Disordered Eating

Lecture 18b: Practice problems for Two-Sample Hypothesis Test of Means

Autism Awareness Education. April 2018

HDV in Greenland. Henrik Krarup, MD, Phd Section of Molecular Diagnostics Aalborg University Hospital Denmark. Aalborg Hospital NORTH DENMARK REGION

Event detection. Biosignal processing, S Autumn 2017

Modified Early Warning Score Effect in the ICU Patient Population

Applied Statistical Analysis EDUC 6050 Week 4

International Journal of Mathematical Archive-4(3), 2013, Available online through ISSN

Routing-Oriented Update SchEme (ROSE) for Link State Updating

Primary: To assess the change on the subject s quality of life between diagnosis and the first 3 months of treatment.

Drug use in Ireland and Northern Ireland

Supplemental Material can be found at: 9.DC1.html

The Suicide Note: Do unemployment rates affect suicide rates? Author: Sarah Choi. Course: A World View of Math and Data Analysis

Introduction. Agent Keith Streff. Humane Investigations: Animal Hoarding & Collecting

Plantar Pressure Difference: Decision Criteria of Motor Relearning Feedback Insole for Hemiplegic Patients

Chem 135: First Midterm

Ovarian Cancer Survival

1 Barnes D and Lombardo C (2006) A Profile of Older People s Mental Health Services: Report of Service Mapping 2006, Durham University.

Measures of Central Tendency - the Mean

CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES

Your health matters. Practical tips and sources of support

Research on the effects of aerobics on promoting the psychological development of students based on SPSS statistical analysis

INSULIN AND CARB COUNTING HEDYEH SANEIFARD PEDIATRIC ENDOCRINOLOGIST SHAHID BEHESHTI UNIVERSITY OF MEDICAL SCIENCE

Performance Improvement in the Bivariate Models by using Modified Marginal Variance of Noisy Observations for Image-Denoising Applications

A longitudinal study of self-assessment accuracy

Outline. Neutron Interactions and Dosimetry. Introduction. Tissue composition. Neutron kinetic energy. Neutron kinetic energy.

What are minimal important changes for asthma measures in a clinical trial?

So... we make an error when we estimate

Simple intervention to improve detection of hepatitis B and hepatitis C in general practice

Minimum skills required by children to complete healthrelated quality of life instruments for asthma: comparison of measurement properties

Improved Ratio and Regression Estimators of the Mean of a Sensitive Variable in Stratified Sampling

Copy of: Proc. IEEE 1998 Int. Conference on Microelectronic Test Structures, Vol.11, March 1998

Previous studies have shown that the agestandardized

Clinical Usefulness of Very High and Very Low Levels of C-Reactive Protein Across the Full Range of Framingham Risk Scores

RADIESSE Dermal Filler for the Correction of Moderate to Severe Facial Wrinkles and Folds, Such As Nasolabial Folds

Estimating Income Variances by Probability Sampling: A Case Study

Epilepsy and Family Dynamics

The Efficiency of the Denver Developmental Screening Test with Rural Disadvantaged Preschool Children 1

talking about Men s Health...

Transcription:

fe1. Defiitios agai Review the defiitios of POPULATIO, SAMPLE, PARAMETER ad STATISTIC. STATISTICAL IFERECE: a situatio where the populatio parameters are ukow, ad we draw coclusios from sample outcomes (those are statistics) to make statemets about the value of the populatio parameters. Whe radom samples are draw from a populatio of iterest to represet the whole populatio, they are geerally ubiased ad represetative. The key to uderstadig why samples behave this way is a difficult cocept: THE SAMPLIG DISTRIBUTIO. The samplig distributio is a theoretical/coceptual/ideal probability distributio of a statistic. A theoretical probability distributio is what the outcomes (i.e. statistics) of some radom process (e.g. drawig a sample from populatio ) would look like if you could repeat the radom process over ad over agai ad had iformatio (that is statistics) from every possible sample. ote that a samplig distributio is the theoretical probability distributio of a statistic. The samplig distributio shows how a statistic varies from sample to sample ad the patter of possible values a statistic takes. We do ot actually see samplig distributios i real life, they are simulated. 2. Samplig Distributios for Meas Let s suppose that the 1,428 or so people this example are a populatio. Ad here is the mea µ y (mu) ad stadard deviatio y (sigma) of our populatio: age ------------------------------------------------------------- Percetiles Smallest 1% 19 18 5% 22 18 10% 25 18 Obs 1425 25% 32 18 Sum of Wgt. 1425 50% 42 Mea 45.42035 Largest Std. Dev. 17.11534 75% 56 89 90% 72 89 Variace 292.9348 95% 79 89 Skewess.5865022 99% 87 89 Kurtosis 2.504332 Suppose we draw a simple radom sample of size from a large populatio. Call the observed values 1, 2,...,. A example: draw a simple radom sample (SRS) of 25 from the 1,425 persos with measured age. Measure the average age from the sample of size 25 ad compare it to the populatio average. Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- age 25 42.68 14.25868 21 71 A statistic: The mea of the sample of 25, 42.68 is just the old mea (from Chapter 5), here are the ages of the 25 people who were sampled:

71 60 55 55 43 41 25 30 24 43 24 50 36 66 57 32 29 21 41 43 26 58 43 55 39 We defie the mea of a sigle sample as y = y 1 + y 2 +... + y this is from chapter 5, ad we defie 2 ( i ) i= 1 the Stadard deviatio of a sigle sample as S y = 1 also from chapter 5 y ca be thought of as the mea of a sigle sample of size 25 selected at radom from all possible samples of size 25 that could have bee geerated from the populatio. RULE 1: The mea of all possible sample meas (all possible y ) is deoted µ which i theory should be equal to µ y (the true populatio mea). I other words, the mea of sample meas ( µ ) calculated from all possible samples of the same size from the same populatio should be equal to the true populatio mea µ y. We ca check this usig a simulatio. If I were to draw 10,000 samples of size 25 (with replacemet) from our populatio of 1,428 (with mea age of 45.42035 years) the mea of all 10,000 sample meas will be equal to, i theory, our true populatio mea. r(mea) ------------------------------------------------------------- Percetiles Smallest 1% 37.72 34.8 5% 39.88 35.12 10% 41.04 35.16 Obs 10000 25% 43 35.24 Sum of Wgt. 10000 50% 45.36 Mea 45.39324 Largest Std. Dev. 3.43353 75% 47.68 56.6 90% 49.84 56.64 Variace 11.78913 95% 51.16 58.16 Skewess.1262797 99% 53.76 58.48 Kurtosis 2.951363 This is the overall average of 10,000 sample meas from samples of size 25 draw with replacemet from our origial populatio of 1,428. We got 45.39324 as the mea of the 10,000 sample meas of all of our samples of size 25, this is very close to the true µ (or µ y ) of 45.42035 Here s graph of the 10,000 sample meas from our 10,000 samples of size 25:

35 40 45 50 55 60 r(mea) Desity 0.05.1.15 Look familiar? The mea of all sample meas µ is cosidered a ubiased estimator of µ y (the true populatio mea) whe it comes from a radom sample. If your samples are ot radom, this relatioship will ot hold. For our first sample of 25 people, the mea of the sample is 42.68 but the mea of all 10,000 of the sample meas is 45.39 ad it s ot too differet from the true populatio mea of 45.42 RULE 2. The theoretical stadard deviatio of all possible y 's from all possible samples of size is y = where y is the stadard deviatio of the populatio. I our populatio data, y is 17.11534 so the theoretical stadard deviatio for a distributio of all possible sample meas from samples of size 25 should be y = 17.11534 = = 3.423068 25 We ca check whether this holds true or ot by examiig the results of a simulatio from the output above, the stadard deviatio for our 10,000 sample meas (from our samples of size 25) is 3.43353, agai, very close to what we get from the theory (3.423068). This rule is approximately correct as log as your sample is o larger tha 5% of your populatio. So please make a ote of this:

o A sample has a mea y ad it has a stadard deviatio s. o A populatio has a mea µ y ad a stadard deviatio y o A samplig distributio or a distributio of all possible sample statistics, i this case o the sample mea, also has a mea deoted µ ad i theory it s equal to µ y but with a y stadard deviatio of =. our sample (or ay real-life sample) is just oe sigle realizatio of all possible samples from a populatio of samples.. The stadard deviatio y = of all the SAMPLE MEAS will be smaller tha the stadard deviatio for a sigle sample. I other words, it is easier to predict the mea of may observatios tha it is to predict the value of a sigle observatio (or to predict the average of small samples). What is causig this? Examie the formula for the stadard deviatio of the samplig distributio, ote the effect of sample size o the stadard deviatio of all sample meas. The bigger the sample size gets, the smaller y = becomes. Some thigs to cosider How close is y to µ or i other words, how accurate will our samples be? I order to do this, you will eed to kow the stadard deviatio of the populatio y ad the sample size ote how the stadard deviatio of the samplig distributio chages with sample size. For big samples, the stadard deviatio for the sample mea will be small ad for small samples, the stadard deviatio for the sample mea is large. 3. RULES 3 & RULE 4: ormal Distributios ad The Cetral Limit Theorem Give a simple radom sample of size from a populatio havig mea µ y ad stadard deviatio y, the sample mea y will come from a samplig distributio of all possible sample meas with mea y µ ad stadard deviatio =. A. Basic Distributioal Result If the origial populatio had a ormal distributio, the the distributio of the sample mea will also be ormally distributed. This is good, because it meas we ca use the ormal table to make ifereces about a particular sample with a statemet of probability or chace. Example. IQ scores are ormally distributed with a mea of 100 ad a stadard deviatio of 15. A sample of 25 persos is draw. How likely is it to get a sample average of 108 or more? (0.38%) How likely is it for the first score to be 108 or more? (29.8%) B. The Cetral Limit Theorem (p. 343) o matter what the distributio of the origial populatio (recall our origial oe is left skewed), if the sample size is "sufficiet", the distributio of the possible sample meas will be close to the ormal distributio. It is a very powerful theorem ad it is the reaso why the ormal distributio is so well studied.

C. Summary Take a simple radom sample from a populatio with mea µ y ad stadard deviatio y. Let y be the average of the samples take from the populatio. If either the origial populatio is ormally distributed OR the sample size is sufficietly large, the all the y will be ormally distributed with mea µ =µ y ad stadard deviatio = y If the histogram for the populatio follows a ormal curve, or if the sample size is large eough each time, the the histogram for the possible values for y will follow a ormal curve that has a mea of µ y y ad a stadard deviatio of =. Thus, about 68% of the y will be withi oe stadard deviatio of the true populatio mea about 95% of the y will be withi two stadard deviatios ad 99.7% of the y will be withi 3 SEs Let's go back to our first sample of 25 with its mea of 42.68. The chace of gettig a mea that low or 42.68 45.42 17.11534 25 lower is: (1) calculate Z = = -.80.. Z about -.80, the (2) do a look-up from stadard ormal table ad you get.2119 i the area beyod Z. So the chace (probability) of drawig a sample of size 25 with a average of 42.68 or lower whe you were expectig the average to be 45.42 was about 21.19% our iterpretatio is that about 21% of time you would get a sample average as low as the oe got. This suggests that it s ot too uusual to be this far from the true average eve though you have doe everythig correctly (e.g. radom sample). OTE: The Cetral Limit Theorem oly applies to the distributio of possible sample averages (i.e. the samplig distributio) it says othig about the distributio of idividual scores i either the sample or i the populatio. For example, here is a graph of our age variable for the populatio followed by a graph of our sample of size 25 (from the begiig of this lecture)

20 40 60 80 100 age Desity 0.01.02.03.04 20 30 40 50 60 70 age Desity 0.05.1.15 otice: either oe are ormal, but we ca use the ormal curve to help us make statemets of chace ad accuracy because of the samplig distributio (it s ormal as log as the sample size is sufficietly large)

4. A special case of meas: The proportio A proportio could be thought of as the mea of a special kid of populatio. The populatio oly has values of 1 or 0. If a populatio has that feature, the populatio mea is p which is the proportio of 1 s i your populatio. Ad the populatio stadard deviatio is p = p * For example: q where q is the value of (1.0 proportio of p s) clito Freq. Percet Cum. ------------+----------------------------------- 0 379 43.36 43.36 1 495 56.64 100.00 ------------+----------------------------------- Total 874 100.00 clito ------------------------------------------------------------- Percetiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 874 25% 0 0 Sum of Wgt. 874 50% 1 Mea.5663616 Largest Std. Dev..4958603 75% 1 1 90% 1 1 Variace.2458775 95% 1 1 Skewess -.2678155 99% 1 1 Kurtosis 1.071725 Proportios also have a samplig distributio, it s a distributio of sample proportios ad this distributio has a mea of p ad a stadard deviatio of p = pq Ad if I were to ru a simulatio of samples of size 25 for 10000 samples clito ------------------------------------------------------------- Percetiles Smallest 1%.2666667.0833333 5%.3529412.0833333 10%.4.0909091 Obs 10000 25%.4705882.1 Sum of Wgt. 10000 50%.5714286 Mea.5657526 Largest Std. Dev..1289778 75%.6470588.9411765 90%.7333333.9444444 Variace.0166353 95%.7692308 1 Skewess -.0970473 99%.8571429 1 Kurtosis 2.948515

0.2.4.6.8 1 r(mea) Desity 0 1 2 3 4 We ca see that proportios behave like the mea, i theory it wats to ceter o the value of p (the true populatio proportio) ad have a stadard deviatio of p = pq