Statistics 11 Lecture 18 Sampling Distributions (Chapter 6-2, 6-3) 1. Definitions again

Similar documents
Statistics Lecture 13 Sampling Distributions (Chapter 18) fe1. Definitions again

Objectives. Sampling Distributions. Overview. Learning Objectives. Statistical Inference. Distribution of Sample Mean. Central Limit Theorem

How is the President Doing? Sampling Distribution for the Mean. Now we move toward inference. Bush Approval Ratings, Week of July 7, 2003

Lecture Outline. BIOST 514/517 Biostatistics I / Applied Biostatistics I. Paradigm of Statistics. Inferential Statistic.

Concepts Module 7: Comparing Datasets and Comparing a Dataset with a Standard

Review for Chapter 9

Chapter 21. Recall from previous chapters: Statistical Thinking. Chapter What Is a Confidence Interval? Review: empirical rule

Chapter 8 Descriptive Statistics

Measures of Spread: Standard Deviation

Sec 7.6 Inferences & Conclusions From Data Central Limit Theorem

Estimation and Confidence Intervals

5/7/2014. Standard Error. The Sampling Distribution of the Sample Mean. Example: How Much Do Mean Sales Vary From Week to Week?

Statistics for Managers Using Microsoft Excel Chapter 7 Confidence Interval Estimation

CHAPTER 8 ANSWERS. Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley

Objectives. Types of Statistical Inference. Statistical Inference. Chapter 19 Confidence intervals: Estimating with confidence

Sampling Distributions and Confidence Intervals

Standard deviation The formula for the best estimate of the population standard deviation from a sample is:

23.3 Sampling Distributions

Practical Basics of Statistical Analysis

Sample Size Determination

Chapter 8 Student Lecture Notes 8-1

GOALS. Describing Data: Numerical Measures. Why a Numeric Approach? Concepts & Goals. Characteristics of the Mean. Graphic of the Arithmetic Mean

Statistical Analysis and Graphing

EDEXCEL NATIONAL CERTIFICATE UNIT 28 FURTHER MATHEMATICS FOR TECHNICIANS OUTCOME 1- ALGEBRAIC TECHNIQUES TUTORIAL 3 - STATISTICAL TECHNIQUES

Measuring Dispersion

Chapter 23 Summary Inferences about Means

Technical Assistance Document Algebra I Standard of Learning A.9

A Supplement to Improved Likelihood Inferences for Weibull Regression Model by Yan Shen and Zhenlin Yang

Appendix C: Concepts in Statistics

Estimating Means with Confidence

Chapter 18 - Inference about Means

Chapter 7 - Hypothesis Tests Applied to Means

Bayesian Sequential Estimation of Proportion of Orthopedic Surgery of Type 2 Diabetic Patients Among Different Age Groups A Case Study of Government

Methodology National Sports Survey SUMMARY

Caribbean Examinations Council Secondary Education Certificate School Based Assessment Additional Math Project

Chapter 7 - Hypothesis Tests Applied to Means

Estimation Of Population Total Using Model-Based Approach: A Case Of HIV/AIDS In Nakuru Central District, Kenya

Intro to Scientific Analysis (BIO 100) THE t-test. Plant Height (m)

EFSA Guidance for BMD analysis Fitting Models & Goodness of Fit

JUST THE MATHS UNIT NUMBER STATISTICS 3 (Measures of dispersion (or scatter)) A.J.Hobson

Methodology CHAPTER OUTLINE

Should We Care How Long to Publish? Investigating the Correlation between Publishing Delay and Journal Impact Factor 1

Chem 135: First Midterm

The Nutritional Density Ratio Dilemma: Developing a Scale for Nutritional Value Paul D. Q. Campbell

Introduction. The Journal of Nutrition Methodology and Mathematical Modeling

STATISTICAL ANALYSIS & ASTHMATIC PATIENTS IN SULAIMANIYAH GOVERNORATE IN THE TUBER-CLOSES CENTER

The Suicide Note: Do unemployment rates affect suicide rates? Author: Sarah Choi. Course: A World View of Math and Data Analysis

5.1 Description of characteristics of population Bivariate analysis Stratified analysis

DISTRIBUTION AND PROPERTIES OF SPERMATOZOA IN DIFFERENT FRACTIONS OF SPLIT EJACULATES*

Lecture 4: Distribution of the Mean of Random Variables

Estimating Bias Error Distributions

RADIESSE Dermal Filler for the Correction of Moderate to Severe Facial Wrinkles and Folds, Such As Nasolabial Folds

Reporting Checklist for Nature Neuroscience

Autism Awareness Education. April 2018

(4) n + 1. n+1. (1) 2n 1 (2) 2 (3) n 1 2 (1) 1 (2) 3 (1) 23 (2) 25 (3) 27 (4) 30

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods:

Plantar Pressure Difference: Decision Criteria of Motor Relearning Feedback Insole for Hemiplegic Patients

Chapter - 8 BLOOD PRESSURE CONTROL AND DYSLIPIDAEMIA IN PATIENTS ON DIALYSIS

Lecture 18b: Practice problems for Two-Sample Hypothesis Test of Means

Performance Improvement in the Bivariate Models by using Modified Marginal Variance of Noisy Observations for Image-Denoising Applications

Introduction. Agent Keith Streff. Humane Investigations: Animal Hoarding & Collecting

ANALYZING ECOLOGICAL DATA

talking about Men s Health...

HDV in Greenland. Henrik Krarup, MD, Phd Section of Molecular Diagnostics Aalborg University Hospital Denmark. Aalborg Hospital NORTH DENMARK REGION

Improved Ratio and Regression Estimators of the Mean of a Sensitive Variable in Stratified Sampling

MEDICAL HOME: Inside: Feeling Blue about the Holidays? Disordered Eating

Epilepsy and Family Dynamics

Minimum skills required by children to complete healthrelated quality of life instruments for asthma: comparison of measurement properties

Routing-Oriented Update SchEme (ROSE) for Link State Updating

Primary: To assess the change on the subject s quality of life between diagnosis and the first 3 months of treatment.

Maximum Likelihood Estimation of Dietary Intake Distributions

Repeatability of the Glaucoma Hemifield Test in Automated Perimetry

STATISTICS. , the mean deviation about their mean x is given by. x x M.D (M) =

Modified Early Warning Score Effect in the ICU Patient Population

Whether you have a bacterial infection or a viral infection, there are things you can do to help yourself feel better:

Study No.: Title: Rationale: Phase: Study Period: Study Design: Centres: Indication: Treatment: Objectives: Primary Outcome/Efficacy Variable:

Teacher Manual Module 3: Let s eat healthy

Confidence Intervals and Point Estimation

Automatic reasoning evaluation in diet management based on an Italian cookbook

The Effect of Question Order on Reporting Physical Activity and Walking Behavior

Measuring the Ability to Identify One s Own Emotions: The Development and Initial Psychometric Evaluation of a Maximum-Performance Test

A PATIENT S GUIDE TO PLASMA EXCHANGE

CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES

04/11/2014 YES* YES YES. Attitudes = Evaluation. Attitudes = Unique Cognitive Construct. Attitudes Predict Behaviour

The Strengths and Difficulties Questionnaire: A Research Note

1 Barnes D and Lombardo C (2006) A Profile of Older People s Mental Health Services: Report of Service Mapping 2006, Durham University.

Supplemental Material can be found at: 9.DC1.html

What are minimal important changes for asthma measures in a clinical trial?

Drug use in Ireland and Northern Ireland

Estimating Income Variances by Probability Sampling: A Case Study

International Journal of Mathematical Archive-4(3), 2013, Available online through ISSN

Clinical Usefulness of Very High and Very Low Levels of C-Reactive Protein Across the Full Range of Framingham Risk Scores

GSK Medicine Study Number: Title: Rationale: Study Period: Objectives: Primary Secondary Indication: Study Investigators/Centers: Research Methods

Lecture 19: Analyzing transcriptome datasets. Spring 2018 May 3, 2018

The Efficiency of the Denver Developmental Screening Test with Rural Disadvantaged Preschool Children 1

stop me or my friends!

IMPAIRED THEOPHYLLINE CLEARANCE IN PATIENTS WITH COR PULMONALE

Mapping out Deaf spaces in Montreal - GIS applications to Deaf geography

A longitudinal study of self-assessment accuracy

l A data structure representing a list l A series of dynamically allocated nodes l A separate pointer (the head) points to the first

Transcription:

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3). Defiitios agai Review the defiitios of POPULATION, SAMPLE, PARAMETER ad STATISTIC. STATISTICAL INFERENCE: a situatio where the populatio parameters are ukow, ad we draw coclusios from sample outcomes (those are statistics) to make statemets about the value of the populatio parameters. (p. 94 text refers to measurig sample reliability/trustworthiess) Whe radom samples are draw from a populatio of iterest to represet the whole populatio, they are geerally ubiased ad represetative. The key to uderstadig why samples behave this way is a difficult cocept: THE SAMPLING DISTRIBUTION. The samplig distributio is a theoretical/coceptual/ideal probability distributio of a statistic. A theoretical probability distributio is what the outcomes (i.e. statistics) of some radom process (e.g. drawig a sample from populatio) would look like if you could repeat the radom process over ad over agai ad had iformatio (that is the statistics) from every possible sample. Note that a samplig distributio is the theoretical probability distributio of a statistic. The samplig distributio shows how a statistic varies from sample to sample ad the patter of possible values a statistic takes. We do ot actually see samplig distributios i real life, they are simulated.. Samplig Distributios for Meas Geerally, the objective i samplig is to estimate a populatio mea µ from sample iformatio Let s suppose that the 78,455 or so people i this example are a populatio. Ad here is the mea µ (mu) ad stadard deviatio (sigma) of our populatio: HINC ------------------------------------------------------------- Percetiles Smallest % 500 0 5% 0000 90 0% 6970 400 Obs 78455 5% 36090 450 Sum of Wgt. 78455 50% 63000 Mea 7863.53 Largest Std. Dev. 6360.55 75% 03000 409000 90% 5300 48000 Variace 4.05e+09 95% 9000 437740 Skewess.05389 99% 330000 634000 Kurtosis 0.744 Desity 0.0e-06 4.0e-06 6.0e-06 8.0e-06.0e-05 0 00000 00000 300000 400000 500000 600000 HINC Suppose we draw a simple radom sample of size from a large populatio. A simple radom sample is a sample where () each member of the populatio had the same chace of beig selected (ubiased) () the selectio of oe member has o effect o the probability of aother member beig selected (idepedet). Sice the sample observatios come from the same populatio, we say that the observatios are idepedet, idetically distributed (i.i.d.) For the samples i this class, you should assume this coditio. Let us call the observed values from the sample,,...,. A example: draw a simple radom sample (SRS) of 5 from the 78,455 households with measured household icome. Measure the average from the sample of size 5 ad compare it to the populatio average. Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- hic 5 853.6 775.65 0 385000

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3) A statistic: The mea of the sample of 5, $8,53.60 is just the plai old mea (from Chapter page 34), here are the household icomes of the 5 people who were sampled: hic. 4900. 33000 3. 30000 4. 385000 5. 6040 6. 47300 7. 5000 8. 5000 9. 56030 0. 0000. 5000. 60000 3. 400 4. 37640 5. 56500 6. 33700 7. 6500 8. 4000 9. 04500 0. 04390. 5700. 80 3. 0 4. 450 5. 86000 x + x +... + x We defie the mea of a sigle sample as x = this is from chapter, ad we defie the Stadard deviatio of a sigle sample as ( i ) i = Sx = also from chapter. This ca be thought of as the mea of a sigle sample of size 5 selected at radom from all possible samples of size 5 that could have bee geerated from the populatio. A. The Expected Value of the Sample Mea We certaily would have liked to have doe better, that is a sample mea of $8,53.60 is ot the same as the populatio mea of 78,63.53. Is the sample mea a good estimator of the true populatio mea µ? Theory says let s thik of a sample from a populatio as beig a set of radom variables i other words, while we might kow what might be possible with respect to household icomes, we do t kow what the sample will look like util it s actually draw. The sample mea (from page 96 of your text) is defied as a combiatio of radom variables: [ + + ]... + the sample mea, beig a liear combiatio of radom variables is itself a radom variable. So ow we ask the questio: what is the expected value ad the variace (or stadard deviatio) of the sample mea, a radom variable. [ E( ) + E( ) +... E( )] E ( ) + but a radom variable will have a distributio p(x) with ( )... mea µ. So E()=E()= = µ ad the [ µ + µ + + µ ] = [ µ ] = µ E the iterpretatio, that o average, the sample mea will be expected to be or should be equal to µ RULE : The mea of all possible sample meas (all possible x of the same size sampled from the same populatio ) is deoted which i theory should be equal to µ (the true populatio mea).

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3) I other words, the mea of sample meas calculated from all possible samples of the same size from the same populatio should be equal to the true populatio mea. We ca check this usig a simulatio. If I were to draw 0,000 samples of size 5 (with replacemet) from our populatio of 78,455 (with mea icome of $78,63.53) the mea of all 0,000 sample meas will be equal to, i theory, our true populatio mea. r(mea) ------------------------------------------------------------- Percetiles Smallest % 5905 40.4 5% 5954.4 4048 0% 6887. 4397.6 Obs 0000 5% 69445.4 47.8 Sum of Wgt. 0000 50% 7750.6 Mea 7809.64 Largest Std. Dev. 437.97 75% 8695. 354.8 90% 94498.4 33668 Variace.55e+08 95% 99679 34684 Skewess.376673 99% 0537. 37485.6 Kurtosis 3.889 This is the overall average of 0,000 sample meas from samples of size 5 draw with replacemet from our origial populatio of 78,455. We got $78,09.64 as the mea of the 0,000 sample meas of all of our samples of size 5, this is very close to the true populatio mea of $78,63.53 (we are off by.059%) Here s graph of the 0,000 sample meas from our 0,000 samples of size 5: Desity 0.0e-05.0e-05 3.0e-05 40000 60000 80000 00000 0000 40000 r(mea) Does it look familiar? The mea of all sample meas is cosidered a ubiased estimator of µ (the true populatio mea) whe it comes from a radom sample. If your samples are ot radom, this relatioship will ot hold. For our first sample of 5 households, the mea of the sample is $8,53.60 but the mea of all 0,000 of the sample meas is $78,09.64 ad it s ot too differet from the true populatio mea of $78,63.53.

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3) B. The (Variace ad) Stadard Deviatio of the Sample Mea Recall that whe we talk about meas, we eed to talk about stadard deviatios because they give us a sese of the typical distaces betwee values. For the sample distributio, we eed to recogize that a differet sample would give us a differet result, the questio becomes how differet? The aswer is foud i calculatig the variace of the samplig distributio. Recall that variaces add icely if the radom variables are idepedet: [ Var( ) + Var( ) +... + Var( )] = Var( ) = [ + + + ] ( )... Var( ) = [ ] = (page 97 of your text) Var this reduces dow to RULE. The theoretical stadard deviatio of all possible x 's from all possible samples of size is called the STANDARD ERROR or SE (to distiguish it from the stadard deviatio) ad it is: SE = = this is paired with the mea of all sample meas above where is the stadard deviatio of the populatio. I our populatio data, is 6360.55 so the theoretical stadard deviatio for a distributio of all possible sample meas from samples of size 5 should be = N = 6360.55 = 7. 5 We ca check whether this holds true or ot by examiig the results of a simulatio from the output above, the stadard deviatio for our 0,000 sample meas (from our samples of size 5) is 437.97, agai, very close to what we get from the theory (7. we are off by about %). This rule is approximately correct as log as your sample is o larger tha 5% of your populatio. So please make a ote of this: o A sample has a mea x ad it has a stadard deviatio s ad variace s. o A populatio has a mea µ ad a stadard deviatio ad variace o A samplig distributio or a distributio of all possible sample statistics, i this case the sample mea, also has a mea deoted µ ad i theory it s equal to µ but with a stadard deviatio (called STANDARD ERROR) of =. Your sample (or ay real-life sample) is just oe sigle realizatio of all possible samples from a populatio of samples. The stadard error = of all the SAMPLE MEANS will be smaller tha the stadard deviatio for a sigle sample ad also smaller tha the stadard deviatio for the populatio. I other words, it is easier to predict the mea of may observatios tha it is to predict the value of a sigle observatio (or to predict the average of

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3) small samples). What is causig this? Examie the formula for the stadard error of the samplig distributio, ote the effect of sample size o the stadard error of all sample meas. The bigger the sample size gets, the smaller = becomes. 3. Normal Distributios ad The Cetral Limit Theorem Give a simple radom sample of size from a populatio havig mea µ ad stadard deviatio, the sample mea x will come from a samplig distributio of all possible sample meas with mea µ ad stadard deviatio (called the stadard error to make a distictio) = A. Basic Distributioal Result If the origial populatio had a ormal distributio, the the distributio of the sample mea will also be ormally distributed. This is good, because it meas we ca use the ormal table to make ifereces about a particular sample with a statemet of probability or chace. Example. IQ scores are ormally distributed with a mea of 00 ad a stadard deviatio of 5. A sample of 5 persos is draw. How likely is it to get a sample average of 08 or more? (Usig Z scores 0.4% or.004 from Table IV) How likely is it for the very first score to be 08 or more? (9.8% or.98 from Table IV) B. The Cetral Limit Theorem (p. 0) No matter what the distributio of the origial populatio (recall our origial oe is highly right skewed), if the sample size is "large", the distributio of the possible sample meas will be close to the ormal distributio (ofte 0 to 0 is large eough). It is a very powerful theorem ad it is the reaso why the ormal distributio is so well studied, we are iterested i estimatig meas ad the CLT helps us to uderstad what to expect. C. Normal Approximatio Rule (p. 0) I radom samples of size, the sample mea will fluctuate aroud the populatio mea with a stadard error of. Therefore, as icreases i size, the samplig distributio of the sample meas cocetrates more ad more aroud the populatio mea (this is why bigger samples are better, they icrease accuracy). The samplig distributio will become more ad more ormal. Let's go back to our first sample of 5 with its mea of 853.60. The chace of gettig a mea that high or 853.60 7863.53 higher is: () calculate Z = = +. 34 6360.55 5.. Z about.34, the () do a look-up from stadard ormal table ad you get.367 i the area beyod Z. So the chace (probability) of drawig a sample of size 5 with a average of 853.60 or higher whe you were expectig the average to be 78,63.53 was about 36.7% Your iterpretatio is that about 37% of time you would get a sample average as high as the oe you got. This suggests that it s ot too uusual to be this far from the true average eve though you have doe everythig correctly (e.g. radom sample). NOTE: The Cetral Limit Theorem oly applies to the distributio of possible sample averages (i.e. the samplig distributio) it says othig about the distributio of idividual scores i either the sample or i the populatio. For example (ext page), here is a graph of our household icome variable for the populatio followed by a graph of our iitial sample of size 5 (from the begiig of this lecture) Note that either is ormal, but the samplig distributio of all possible samples of size 5 is ormal.

Statistics Lecture 8 Samplig Distributios (Chapter 6-, 6-3) The Origial Populatio of all households (78,455) with mea 78,63.53 Desity 0.0e-06 4.0e-06 6.0e-06 8.0e-06.0e-05 0 00000 00000 300000 400000 500000 600000 HINC Our oe sample of size 5 with these statistics Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- hic 5 853.6 775.65 0 385000 Desity 0 5.0e-06.0e-05.5e-05 0 00000 00000 300000 400000 hic