STATISTICAL DATA ANALYSIS IN EXCEL

Similar documents
BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Clinical Study Report Synopsis Drug Substance Naloxegol Study Code D3820C00018 Edition Number 1 Date 01 February 2013 EudraCT Number

Single-Molecule Studies of Unlabelled Full-Length p53 Protein Binding to DNA

ENERGY CONTENT OF BARLEY

The Measurement of Interviewer Variance

Community. Profile Yellowstone County. Public Health and Safety Division

Community. Profile Lewis & Clark County. Public Health and Safety Division

Community. Profile Missoula County. Public Health and Safety Division

Community. Profile Powell County. Public Health and Safety Division

Community. Profile Big Horn County. Public Health and Safety Division

Community. Profile Anaconda- Deer Lodge County. Public Health and Safety Division

Finite-Dimensional Linear Algebra Errata for the first printing

Community. Profile Carter County. Public Health and Safety Division

EVALUATION OF DIFFERENT COPPER SOURCES AS A GROWTH PROMOTER IN SWINE FINISHING DIETS 1

8/1/2017. Correlating Radiomics Information with Clinical Outcomes for Lung SBRT. Disclosure. Acknowledgements

Math 254 Calculus Exam 1 Review Three-Dimensional Coordinate System Vectors The Dot Product

Comparison of three simple methods for the

Geographical influence on digit ratio (2D:4D): a case study of Andoni and Ikwerre ethnic groups in Niger delta, Nigeria.

Opioid Use and Survival at the End of Life: A Survey of a Hospice Population

Body mass index, waist-to-hip ratio, and metabolic syndrome as predictors of middle-aged men's health

2. Hubs and authorities, a more detailed evaluation of the importance of Web pages using a variant of

Satoshi Yoshida* and Takuya Kida* *Hokkaido University Graduate school of Information Science and Technology Division of Computer Science

Reducing the Risk. Logic Model

Supplementary Online Content

Assessment of Depression in Multiple Sclerosis. Validity of Including Somatic Items on the Beck Depression Inventory II

Quantifying perceived impact of scientific publications

The step method: A new adaptive psychophysical procedure

EFFECTS OF INGREDIENT AND WHOLE DIET IRRADIATION ON NURSERY PIG PERFORMANCE

Extraction and Some Functional Properties of Protein Extract from Rice Bran

Using Paclobutrazol to Suppress Inflorescence Height of Potted Phalaenopsis Orchids

Effectiveness of Belt Positioning Booster Seats: An Updated Assessment

Invasive Pneumococcal Disease Quarterly Report. July September 2017

THE EVALUATION OF DEHULLED CANOLA MEAL IN THE DIETS OF GROWING AND FINISHING PIGS

CheckMate 153: Randomized Results of Continuous vs 1-Year Fixed-Duration Nivolumab in Patients With Advanced Non-Small Cell Lung Cancer

Digestible Sulfur Amino Acid Requirement of Male Turkeys During the 12 to 18 Week Period

Feeding state and age dependent changes in melaninconcentrating hormone expression in the hypothalamus of broiler chickens

Time trends in repeated spirometry in children

Lipase and Pancreatic Amylase Activities in Tissues and in Patients with Hyperamylasemia

Soybean Hulls as an Alternative Feed for Horses

Analysis of Regulatory of Interrelated Activity of Hepatocyte and Hepatitis B Viruses

Impact of Positive Nodal Metastases in Patients with Thymic Carcinoma and Thymic Neuroendocrine Tumors

HEMOGLOBIN STANDARDS*

The Effects of High-Oil Corn or Typical Corn with or without Supplemental Fat on Diet Digestibility in Finishing Steers

Using Load Research Data to Model Weather Response

Report of the Conference on Low Blood

Rates of weight change for black and white Americans over a twenty year period

Input from external experts and manufacturer on the 2 nd draft project plan Stool DNA testing for early detection of colorectal cancer

EFFECT OF DIETARY ENZYME ON PERFORMANCE OF WEANLING PIGS

The accuracy of creatinine clearance with and without

Utilization of dental services in Southern China. Lo, ECM; Lin, HC; Wang, ZJ; Wong, MCM; Schwarz, E

Muhammad Shoaib, Muhammad Usman, Rabia Fatima, Sajid Aziz, Muhammad Wasif Malik, Muhammad Javaid Asad and Sikandar Khan Sherwani

XII. HIV/AIDS. Knowledge about HIV Transmission and Misconceptions about HIV

DIFFERENTIAL REINFORCEMENT OF VOCAL DURATION1

Risks for All-Cause Mortality: Stratified by Age, Estimated Glomerular Filtration Rate and Albuminuria

Cover Page. The handle holds various files of this Leiden University dissertation

A Two-Stage Sampling Method for Clinical Surveillance of Individuals in Care for HIV Infection in the United States

METHOD 4010 SCREENING FOR PENTACHLOROPHENOL BY IMMUNOASSAY

3/10/ Energy metabolism o How to best supply energy to the pig o How the pig uses energy for growth

Appendix J Environmental Justice Populations

Human protein requirements: nitrogen balance response to graded levels of egg protein in elderly men and women1 2

Analytic hierarchy process-based recreational sports events development strategy research

Chapter II. THE PREVALENCE METHOD John Bongaarts*

Effect of linear and random non-linear programming on environmental pollution caused by broiler production

A review of the patterns of docetaxel use for hormone-resistant prostate cancer at the Princess Margaret Hospital

Copy Number ID2 MYCN ID2 MYCN. Copy Number MYCN DDX1 ID2 KIDINS220 MBOAT2 ID2

Agilent G6825AA MassHunter Pathways to PCDL Software Quick Start Guide

Breast carcinoma grading by histologic features has

Water fl uoridation and dental caries in 5- and 12-year-old children from Canterbury and Wellington

Roughage Type & Level & Grain Processing Interactions with Distiller s s Grains Diets. Matt May High Plains Bio Fuels Co-Product Nutrition Conference

SUPPLEMENTARY INFORMATION

articles are used for each patient and the amounts

3.3 Verotoxigenic E. coli

Teacher motivational strategies and student self-determination in physical education

Hypertension, hyperinsulinaemia and obesity in middle-aged Finns with impaired glucose tolerance

Health-Related Quality of Life and Symptoms of Depression in Extremely Obese Persons Seeking Bariatric Surgery

BMI and Mortality: Results From a National Longitudinal Study of Canadian Adults

ORIGINAL ARTICLE. Diagnostic Signs of Accommodative Insufficiency. PILAR CACHO, OD, ÁNGEL GARCÍA, OD, FRANCISCO LARA, OD, and M A MAR SEGUÍ, OD

Antiviral Therapy 2015; 20: (doi: /IMP2920)

Global Intellectual Deficits in Cystinosis

Rheumatoid-susceptible alleles of HLA-DRB 1 are genetically recessive to non-susceptible alleles in the progression of bone destruction in the wrists

CHOICE BETWEEN CONCURRENT SCHEDULES' RONALD L. MENLOVE2, MARILYNNE MOFFITT, AND CHARLES P. SHIMP

Original Article. T Akter 1, N Islam 2, MA Hoque 3, S Khanam 4, HA khan 5, BK Saha 6. Abstract:

USE OF SORGHUM-BASED DISTILLERS GRAINS IN DIETS FOR NURSERY AND FINISHING PIGS

Impact of Pharmacist Intervention on Diabetes Patients in an Ambulatory Setting

Paper-based skin patch for the diagnostic screening of cystic fibrosis

APA CENTENNIAL FEATURE. Studies of Interference in Serial Verbal Reactions

The relationship between women s subjective and physiological sexual arousal

Using proliferative markers and Oncotype DX in therapeutic decision-making for breast cancer: the B.C. experience

Prognostic significance of pretreatment serum levels of albumin, LDH and total bilirubin in patients with nonmetastatic

Safety and Tolerability of Subcutaneous Sarilumab and Intravenous Tocilizumab in Patients With RA

Analysis of detection results of thyroid function-related indexes in pregnant women and establishment of the reference interval

Check your understanding 3

Studies of the Mortality of Atomic Bomb Survivors, Report 14, : An Overview of Cancer and Noncancer Diseases

The Mid-Depth Method and HIV-1: A Practical Approach for Testing Hypotheses of Viral Epidemic History

RADIATION RESEARCH 158, (2002) /02 $ by Radiation Research Society. All rights of reproduction in any form reserved.

Supporting information

A FACTORIAL STUDY ON THE EFFECTS OF β CYCLODEXTRIN AND POLOXAMER 407 ON THE SOLUBILITY AND DISSOLUTION RATE OF PIROXICAM

Transcription:

Microrry Center STATISTICAL DATA ANALYSIS IN EXCEL Prt 1 Introduction to Sttistics Dr. Petr Nzrov 14-06-2010 petr.nzrov@crp-snte.lu Sttisticl dt nlysis in Excel

COURSE OVERVIEW Objectives The course: Reminds sttisticl bsics Gives methodologicl tools for reserch Provides prcticl skill for fst dt nlysis 4 sessions (6 hours) Orgniztion Lectures re integrted with prcticl work PLEASE: sk questions. Understnding is extremely importnt for future prts Sttisticl dt nlysis in Excel 2

OUTLINE Lecture 1. Reminding Bsics Descriptive sttistics Explortory nlysis Discrete probbility distribution Continues probbility distribution Look for for dt: http://edu.sblb.net/dt/xls Sttisticl dt nlysis in Excel 3

TABULAR AND GRAPHICAL PRESENTATION Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry dt dt showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Mrks A B C B A B B A B C Frequency distribution: Mrk Frequency A 3 B 5 C 2 Totl 10 Reltive frequency distribution: Mrk Frequency A 0.3 B 0.5 C 0.2 Totl 1 In MS Excel use following functions: Percent frequency distribution: =COUNTIF(dt,element) to get number elements found in dt re =SUM(dt) to get sum vlues in dt re Mrk Frequency A 30% B 50% C 20% Totl 100% Sttisticl dt nlysis in Excel 4

TABULAR AND GRAPHICAL PRESENTATION Exmple: Pncretitis Study The role smoking in etiology pncretitis hs been recognized for mny yers. To provide estimtes quntittive significnce se fctors, hospitl-bsed study ws crried out in estern Msschusetts nd Rhode Islnd between 1975 nd 1979. 53 ptients who hd hospitl dischrge dignosis pncretitis were included in this unmtched cse-control study. The control group consisted 217 ptients dmitted for diseses or thn those pncres nd biliry trct. Risk fctor informtion ws obtined from stndrdized interview with ech subject, conducted by trined interviewer. dpted from Chp T. Le, Introductory Biosttistics pncretitis.xls Pncretitis ptients: Smokers Ex-smokers Ex-smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Never Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Never Smokers Smokers Smokers Sttisticl dt nlysis in Excel 5

TABULAR AND GRAPHICAL PRESENTATION Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry dt dt showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. In MS Excel use following functions: =COUNTIF(dt,element) to get number elements found in dt re =SUM(dt) to get sum vlues in dt re pncretitis.xls Frequency distribution: Smoking Cses Controls Never 2 56 Ex-smokers 13 80 Smokers 38 81 Totl 53 217 Reltive frequency distribution: Smoking Cses Controls Never 0.038 0.258 Ex-smokers 0.245 0.369 Smokers 0.717 0.373 Totl 1 1 Sttisticl dt nlysis in Excel 6

TABULAR AND GRAPHICAL PRESENTATION Br nd Pie Chrts pncretitis.xls 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Never Ex-smokers Smokers Never Ex-smokers Smokers Pncretitis Pncretitis Control Control Pncretitis Pncretitis Never Never Ex-smokers Ex-smokers Smokers Smokers Control Control Never Never Ex-smokers Ex-smokers Smokers Smokers In MS Excel use following steps: Chrt Wizrd Columns Set dt rnge (both columns Percent freq. distribution) Chrt Wizrd Pie Set dt rnge (one columns Percent freq. distribution) Sttisticl dt nlysis in Excel 7

TABULAR AND GRAPHICAL PRESENTATION Mice Dt Series Tordf Tordf MG, MG, Bchmnov Bchmnov AA AA Survey clcium & sodium intke Survey clcium & sodium intke nd nd metbolism metbolism with with bone bone nd nd body body composition dt composition dt Project Project symbol: symbol: Tordf3 Tordf3 Accession Accession number: number: MPD:103 MPD:103 mice.xls 790 mice from different strins http://phenome.jx.org prmeter Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight Sttisticl dt nlysis in Excel 8

The following re weights in grms for 970 mice: TABULAR AND GRAPHICAL PRESENTATION Histogrm mice.xls 20.5 23.2 24.6 23.5 26 25.9 23.9 22.8 19.9 20.8 22.4 26 23.8 26.5 26 22.8 22.9 20.9 19.8 22.7 31 22.7 26.3 27.1 18.4 21 18.8 21 21.4 25.7 19.7 27 26.2 21.8 22.2 19.2 21.9 22.6 23.7 26.2 26 27.5 25 20.9 20.6 22.1 20 21.1 24.1 28.8 30.2 20.1 24.2 25.8 21.3 21.8 23.7 23.5 28 27.6 21.6 21 21.3 20.1 20.8 24.5 23.8 29.5 21.4 21.5 24 21.1 18.9 19.5 32.3 28 27.1 28.2 22.9 19.9 20.4 21.3 20.6 22.8 25.8 24.1 23.5 24.2 22 20.3 Sorted weights show tht vlues re in 10 49.6 grms. Let us divide weight into bins bins bins Weight,g Frequency >=10 1 10-20 237 20-30 417 30-40 124 40-50 11 More 0 Sttisticl dt nlysis in Excel 9

TABULAR AND GRAPHICAL PRESENTATION Histogrm Now, let us use bin-size = 1 grm Bin Frequency 10 1 11 13 12 12 13 25 14 29 46 1 47 1 48 0 49 1 50 1 More 0 In Excel use following steps: Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Specify column bins (intervl) upper-limits Tools Dt Anlysis Histrogrm select input dt, bins, nd output (Anlysis ToolPk should be instlled) use Chrt Wizrd Columns to visulize results Sttisticl dt nlysis in Excel 10

TABULAR AND GRAPHICAL PRESENTATION Cumultive Frequency Distribution Cumultive Cumultive frequency frequency distribution distribution A tbulr tbulr summry summry quntittive quntittive dt dt showing showing number number items items with with vlues vlues less less thn thn or or equl equl to to upper upper clss clss limit limit ech ech clss. clss. Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Cumultive reltive frequency Ogive 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 20 30 40 50 Weight, g Sttisticl dt nlysis in Excel 11

TABULAR AND GRAPHICAL PRESENTATION Sctter Plot mice.xls Let us look on mutul dependency Strting nd Ending weights. 50 45 40 35 Ending weight 30 25 20 15 In Excel use following steps: Select dt region Use Chrt Wizrd XY (Sctter) 10 5 0 0 5 10 15 20 25 30 35 40 45 50 Strting weight Sttisticl dt nlysis in Excel 12

TABULAR AND GRAPHICAL PRESENTATION Crosstbultion pncretitis.xls Disese Smoking or pncretitis Totl Ex-smokers 80 13 93 Never 56 2 58 Smokers 81 38 119 Totl 217 53 270 In Excel use following steps: Dt Pivot Tble nd PivotChrt MS Office list + Pivot Tble Set rnge, including heders dt Select output nd set lyout by drg-nd-dropping nmes into tble Sttisticl dt nlysis in Excel 13

NUMERICAL MEASURES Popultion nd Smple Popultion Popultion prmeter prmeter A numericl numericl vlue vlue used used s s summry summry mesure mesure for for popultion popultion (e.g., (e.g., popultion popultion men men µ, µ, vrince vrince σ 2 σ, 2, stndrd stndrd devition devition σ) σ) POPULATION µ men σ 2 vrince N number elements (usully N= ) SAMPLE x m, men s 2 vrince n number elements Smple Smple sttistic sttistic A numericl numericl vlue vlue used used s s summry summry mesure mesure for for smple smple (e.g., (e.g., smple smple men men m, m, smple smple vrince vrince s s 2, 2, nd nd smple smple stndrd stndrd devition devition s) s) mice.xls 790 mice from different strins http://phenome.jx.org All existing lbortory Mus musculus ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 Sttisticl dt nlysis in Excel 14

NUMERICAL MEASURES Mesures Loction Men Men A mesure mesure centrl centrl loction loction computed computed by by summing summing dt dt vlues vlues nd nd dividing dividing by by number number observtions. observtions. x = µ = p = m = x i N n x i ( x true) = i n Medin Medin A mesure mesure centrl centrl loction loction provided provided by by vlue vlue in in middle middle when when dt dt re re rrnged rrnged in in scending scending order. order. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Mode Mode A mesure mesure loction, loction, defined defined s s vlue vlue tht tht occurs occurs with with gretest gretest frequency. frequency. Mode = 23 Medin = 23.5 Men = 31.7 Sttisticl dt nlysis in Excel 15

NUMERICAL MEASURES Mesures Loction mice.xls Histogrm nd p.d.f. pproximtion medinmen mode Femle proportion p f = 0.501 Density 0.00 0.02 0.04 0.06 10 15 20 25 30 35 40 weight, g Bleeding time In Excel use following functions: = AVERAGE(dt) = MEDIAN(dt) = MODE(dt) Density 0.000 0.010 0.020 medin = 55 men = 61 mode = 48 0 50 100 150 200 N = 760 Bndwidth = 5.347 Sttisticl dt nlysis in Excel 16

NUMERICAL MEASURES Quntiles, Qurtiles nd Percentiles Percentile Percentile A vlue vlue such such tht tht t t lest lest p% p% observtions observtions re re less less thn thn or or equl equl to to this this vlue, vlue, nd nd t t lest lest (100-p)% (100-p)% observtions observtions re re greter greter thn thn or or equl equl to to this this vlue. vlue. The The 50-th 50-th percentile percentile is is medin. medin. Qurtiles Qurtiles The The 25th, 25th, 50th, 50th, nd nd 75th 75th percentiles, percentiles, referred referred to to s s first first qurtile, qurtile, second second qurtile qurtile (medin), (medin), nd nd third third qurtile, qurtile, respectively. respectively. In Excel use following functions: =PERCENTILE(dt,p) Weight 12 16 19 22 23 23 24 32 36 42 63 68 Q 1 = 21 Q 2 = 23.5 Q 3 = 39 Sttisticl dt nlysis in Excel 17

NUMERICAL MEASURES Mesures Vribility Interqurtile Interqurtile rnge rnge (IQR) (IQR) A mesure mesure vribility, vribility, defined defined to to be be difference difference between between third third nd nd first first qurtiles. qurtiles. Vrince Vrince A mesure mesure vribility vribility bsed bsed on on squred squred devitions devitions dt dt vlues vlues bout bout men. men. Stndrd Stndrd devition devition A mesure mesure vribility vribility computed computed by by tking tking positive positive squre squre root root vrince. vrince. IQR = Q 3 Q 1 popultion smple σ s x i = N ( ) 2 µ 2 2 ( x x) = n 1 i 2 Smple stndrd devition = s = Popultion stndrd devition = σ = 2 s 2 σ Weight 12 16 19 22 23 23 24 32 36 42 63 68 IQR = 18 Vrince = 320.2 St. dev. = 17.9 In Excel use following functions: =VAR(dt), =STDEV(dt) Sttisticl dt nlysis in Excel 18

NUMERICAL MEASURES Mesures Vribility Coefficient Coefficient vrition vrition A mesure mesure reltive reltive vribility vribility computed computed by by dividing dividing stndrd stndrd Stndrd devition devition devition by by men. 100 % men. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Men CV = 57% Medin Medin bsolute bsolute devition devition (MAD) (MAD) MAD MAD is is robust robust mesure mesure vribility vribility univrite univrite smple smple quntittive quntittive dt. dt. MAD = medin ( x medin( x) ) i Set 1 Set 2 23 23 12 12 22 22 12 12 21 21 18 81 22 22 20 20 12 12 19 19 14 14 13 13 17 17 Set 1 Set 2 Men 17.3 22.2 Medin 18 19 St.dev. 4.23 18.18 MAD 5.93 5.93 Sttisticl dt nlysis in Excel 19

NUMERICAL MEASURES Mesures Vribility Skewness Skewness A mesure mesure shpe shpe dt dt distribution. distribution. Dt Dt skewed skewed to to left left result result in in negtive negtive skewness; skewness; symmetric symmetric dt dt distribution distribution results results in in zero zero skewness; skewness; nd nd dt dt skewed skewed to to right right result result in in positive positive skewness. skewness. Skewness = n ( )( ) i n 1 n 2 s i x x 3 dpted from Anderson et l Sttistics for Business nd Economics Sttisticl dt nlysis in Excel 20

NUMERICAL MEASURES z-score z-score z-score A vlue vlue computed computed by by dividing dividing devition devition bout bout men men (x (x i - i - x) x) by by stndrd stndrd devition devition s. s. A z-score z-score is is referred referred to to s s stndrdized stndrdized vlue vlue nd nd denotes denotes number number stndrd stndrd devitions devitions x x i is i is from from men. men. Chebyshev s orem For For ny ny dt dt set, set, t t lest lest (1 (1 1/z 1/z 2 2 )) dt dt vlues vlues must must be be within within z stndrd devitions from from men, men, where where z ny ny vlue vlue > 1. 1. For ANY distribution: At lest 75 % vlues re within z = 2 stndrd devitions from men At lest 89 % vlues re within z = 3 stndrd devitions from men At lest 94 % vlues re within z = 4 stndrd devitions from men At lest 96% vlues re within z = 5 stndrd devitions from men z i = x i x s Weight z-score 12-1.10 16-0.88 19-0.71 22-0.54 23-0.48 23-0.48 24-0.43 32 0.02 36 0.24 42 0.58 63 1.75 68 2.03 Sttisticl dt nlysis in Excel 21

NUMERICAL MEASURES Detection Outliers For bell-shped distributions: Approximtely 68 % vlues re within 1 st.dev. from men Approximtely 95 % vlues re within 2 st.dev. from men Almost ll dt points re inside 3 st.dev. from men Outlier Outlier An An unusully unusully smll smll or or unusully unusully lrge lrge dt dt vlue. vlue. For For bell-shped bell-shped distributions distributions dt dt points points with with z >3 z >3 cn cn be be considered considered s s outliers. outliers. Exmple: Gussin distribution Weight z-score 23 0.04 12-0.53 22-0.01 12-0.53 21-0.06 81 3.10 22-0.01 20-0.11 12-0.53 19-0.17 14-0.43 13-0.48 17-0.27 Sttisticl dt nlysis in Excel 22

NUMERICAL MEASURES Explortion Dt Anlysis Five-number Five-number summry summry An An explortory explortory dt dt nlysis nlysis technique technique tht tht uses uses five five numbers numbers to to summrize summrize dt: dt: smllest smllest vlue, vlue, first first qurtile, qurtile, medin, medin, third third qurtile, qurtile, nd nd lrgest lrgest vlue vlue children.xls Min. Min. : : 12 12 Q 1 : 1 : 25 25 Medin: Medin: 32 32 Q 3 : 3 : 46 46 Mx. Mx. : : 79 79 In Excel use: Tool Dt Anlysis Descriptive Sttistics Box Box plot plot A grphicl grphicl summry summry dt dt bsed bsed on on five-number five-number summry summry Min Q 2 Q 1 Box Qplot 3 Mx In Excel use (indirect): Chrt Wizrd Stock Open-high-low-close open Q3 high Q3+1.5*IQR low Q1-1.5*IQR close Q1 1.5 IQR Sttisticl dt nlysis in Excel 23

NUMERICAL MEASURES Exmple: Mice Weight Exmple Exmple Build Build box box plot plot for for weights weights mle mle nd nd femle femle mice mice mice.xls 1. Build 5 number summries for mles nd femles Femle Mle Min 10.0 12.0 Q1 17.2 23.8 Q2 20.7 27.1 Q3 23.3 31.2 Mx 41.5 49.6 2. Combine numbers into following order open Q3 high Q3+min(1.5*(Q3-Q1),Mx) low Q1-mx(1.5*(Q3-Q1),Min) close Q1 Mouse weight In Excel use: Chrt Wizrd Stock Open-high-low-close Put series-in-rows Adjust colors, etc Weight, g 45 40 35 30 25 20 15 10 5 0 Femle Mle Sttisticl dt nlysis in Excel 24

NUMERICAL MEASURES Mesure Assocition between 2 Vribles Covrince Covrince A mesure mesure liner liner ssocition ssocition between between two two vribles. vribles. Positive Positive vlues vlues indicte indicte positive positive reltionship; reltionship; negtive negtive vlues vlues indicte indicte negtive negtive reltionship. reltionship. σ xy = popultion ( xi µ x )( yi µ y ) N s xy smple ( x x)( y y) = n 1 i i mice.xls Ending weight vs. Strting weight Ending weight 60 50 40 30 20 10 0 0 10 20 30 40 50 Strting weight In Excel use function: =COVAR(dt) s xy = 39.8 hrd to interpret Sttisticl dt nlysis in Excel 25

NUMERICAL MEASURES Mesure Assocition between 2 Vribles Correltion Correltion (Person (Person product product moment moment correltion correltion coefficient) coefficient) A mesure mesure liner liner ssocition ssocition between between two two vribles vribles tht tht tkes tkeson on vlues vlues between between -1-1 nd nd +1. +1. Vlues Vlues ner ner +1 +1 indicte indicte strong strong positive positive liner liner reltionship, reltionship, vlues vlues ner ner -1-1 indicte indicte strong strong negtive negtive liner liner reltionship; reltionship; nd nd vlues vlues ner ner zero zero indicte indicte lck lck liner liner reltionship. reltionship. popultion ( x x)( y y) σ xy i i ρxy = = σ σ σ σ N x y x y r xy smple sxy = = s s x y ( xi x)( yi y) s s ( n 1) x y 60 Ending weight 50 40 30 20 10 In Excel use function: =CORREL(dt) r xy = 0.94 0 0 10 20 30 40 50 Strting weight mice.xls Sttisticl dt nlysis in Excel 26

NUMERICAL MEASURES Correltion Coefficient If If we we hve hve only only 2 dt dt points points in in x x nd ndy y dtsets, dtsets, wht wht vlues vlues would would you you expect expect for for correltion correltion b/w b/w xx nd nd y y? Wikipedi Sttisticl dt nlysis in Excel 27

DISCRETE PROBABILITY DISTRIBUTION Discrete nd continuous probbility distributions discrete probbility distribution continuous probbility distribution norml probbility distribution Sttisticl dt nlysis in Excel 28

RANDOM VARIABLES Rndom Vribles Rndom Rndom vrible vrible A numericl numericl description description outcome outcome n n experiment. experiment. A rndom vrible is lwys numericl mesure. Roll die Number clls to reception per hour Discrete Discrete rndom rndom vrible vrible A rndom rndom vrible vrible tht tht my my ssume ssume eir eir finite finite number number vlues vlues or or n n infinite infinite sequence sequence vlues. vlues. Continuous Continuous rndom rndom vrible vrible A rndom rndom vrible vrible tht tht my my ssume ssume ny ny numericl numericl vlue vlue in in n n intervl intervl or or collection collection intervls. intervls. Time between clls to reception Volume smple in tube Weight, height, blood pressure, etc Sttisticl dt nlysis in Excel 29

DISCRETE PROBABILITY DISTRIBUTIONS Discrete Probbility Distribution Probbility Probbility distribution distribution A description description how how probbilities probbilities re re distributed distributed over over vlues vlues rndom rndom vrible. vrible. Probbility Probbility function function A function, function, denoted denoted by by f(x), f(x), tht tht provides provides probbility probbility tht tht xxssumes ssumes prticulr prticulr vlue vlue for for discrete discrete rndom rndom vrible. vrible. Roll die Rndom vrible X: x = 1 x = 2 x = 3 x = 4 x = 5 x = 6 Probbility function f(x) Probbility function f(x) 0.2 0.2 0.18 0.18 0.16 0.16 0.14 0.12 0.14 0.12 0.1 0.1 0.08 0.06 0.08 0.06 0.04 0.02 0.04 0.02 0 0 f ( x) 0 f ( x) = 1 Probbility distribution for die roll Probbility distribution for die roll 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Vrible x Vrible x Probbility function f(x) Probbility function f(x) 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Number cells under microscope Rndom vrible X: x = 0 x = 1 x = 2 x = 3 P.D. for number cells Probbility distribution for die roll Probbility distribution for die roll 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Vrible x Vrible x Sttisticl dt nlysis in Excel 30

CONTINUOUS PROBABILITY DISTRIBUTIONS Probbility Density Probbility Probbility density density function function A function function used used to to compute compute probbilities probbilities for for continuous continuous rndom rndomvrible. The The re re under under grph grph probbility probbility density density function function over over n n intervl intervl represents represents probbility. probbility. 0.3 0.3 Probbility Probbility density density 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 Are =1 x f ( x) = 1 0 0 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.2 1.2 1.4 1.4 Vrible Vrible x x Sttisticl dt nlysis in Excel 31

CONTINUOUS PROBABILITY DISTRIBUTIONS Norml Probbility Distribution Norml Norml probbility probbility distribution distribution A continuous continuous probbility probbility distribution. distribution. Its Its probbility probbility density density function function is is bell bell shped shped nd nd determined determined by by its its men men µ µ nd nd stndrd stndrd devition devition σ. σ. f ( x µ ) 1 2 2σ ( x) = e σ 2π 2 In Excel use function: = NORMDIST(x,m,s,flse) for probbility density function = NORMDIST(x,m,s,true) for cumultive probbility function norml distribution (re from left to x) Sttisticl dt nlysis in Excel 32

CONTINUOUS PROBABILITY DISTRIBUTIONS Stndrd Norml Probbility Distribution Stndrd Stndrd norml norml probbility probbility distribution distribution A norml norml distribution distribution with with men men zero zero nd nd stndrd stndrd devition devition one. one. f ( x) = 1 e 2π 2 x 2 z x µ = x σ = zσ + µ In Excel use function: = NORMSDIST(z) Sttisticl dt nlysis in Excel 33

CONTINUOUS PROBABILITY DISTRIBUTIONS Dose Selection Exmple Exmple Assume Assume tht tht you you hve hve developed developed n n extremely extremely efficient efficient chemicl chemicl tretment tretment for for glioblstom. glioblstom. During During tests tests on on niml niml models models it it ws ws found found tht tht substnce substnce X, X, which which you you use, use, is is ble ble to to kill kill ll ll tumor tumor cells cells (oreticlly), (oreticlly), but but being being given given t t high high concentrtion concentrtion it it leds leds to to deth deth ptient ptient due due to to intoxiction. intoxiction. As As survived survived cncer cncer cells cells fst fstevolve into into resistnt resistnt form, form, efficiency efficiency tretment tretment is is significntly significntly reduced reduced if if second second course course is is given. given. Therefore Therefore tretment tretment should should be be performed performed in in one one injection. injection. The The experimentl experimentl dt dt suggest suggest tht tht verge verge concentrtion concentrtion needed needed for for positive positive tretment tretment is is 1 µg/kg. µg/kg. The The concentrtion concentrtion needed needed for for effective effective tretment tretment is, is, course, course, rndom rndom vrible. vrible. Being Being presented presented in in log10 log10 scle scle nd nd in in g/kg, g/kg, it it cn cn be be pproximted pproximted by by norml norml rndom rndom vrible vrible with with men men 6 6 nd nd stndrd stndrd devition devition 0.4. 0.4. The The 50% 50% lethl lethl dose dose for for humn humn is is 35 35 µg/kg. µg/kg. And And tests tests on on nimls nimls suggest suggest tht tht in in log10 log10 scle scle it it hs hs norml norml distribution distribution s s well well with with stndrd stndrd devition devition 0.3. 0.3. prmeter ug/kg log scle men positive tretment 1-6 std positive tretment x 0.4 men lethl dose 35-4.456 std lethl dose x 0.3 Sttisticl dt nlysis in Excel 34

CONTINUOUS PROBABILITY DISTRIBUTIONS Dose Selection prmeter ug/kg log scle men positive tretment 1-6 std positive tretment x 0.4 men lethl dose 35-4.456 std lethl dose x 0.3 pdf. tretment success pdf. deth due to tretment 1.4 1.2 1 0.8 0.6 0.4 0.2-8 -7.5-7 -6.5-6 -5.5-5 -4.5-4 -3.5-3 0 In Excel use function: = NORMDIST(x,men,std,FALSE) Sttisticl dt nlysis in Excel 35

CONTINUOUS PROBABILITY DISTRIBUTIONS Probbility to die from disese = inverse probbility to tret Over-dose nd disese behviors re independent => P( survive ) = P( hel disese ) * P( survive tretment ) Dose Selection probbility recidive diseseprobbility recidive disese probbility to die substnce probbility to die substnce problility to survive 1.2 1 0.8 0.6 0.4 0.2 0-8 -7.5-7 -6.5-6 -5.5-5 -4.5-4 -3.5-3 In Excel use function: = NORMDIST(x,men,std,TRUE) Sttisticl dt nlysis in Excel 36

SAMPLING DISTRIBUTION Smpling distribution smple nd popultion nd ir prmeters centrl limit orem types smpling Sttisticl dt nlysis in Excel 37

SAMPLING DISTRIBUTION Prmeters Popultion Popultion prmeter prmeter A numericl numericl vlue vlue used used s s summry summry mesure mesure for for popultion popultion (e.g., (e.g., men men µ, µ, vrince vrince σ 2 σ, 2, stndrd stndrd devition devition σ, σ, proportion proportion π) π) POPULATION µ men σ 2 vrince N number elements (usully N= ) SAMPLE x m, men s 2 vrince n number elements Smple Smple sttistic sttistic A numericl numericl vlue vlue used used s s summry summry mesure mesure for for smple smple (e.g., (e.g., smple smple men men m, m, smple smple vrince vrince s s 2, 2, nd nd smple smple stndrd stndrd devition devition s) s) mice.xls 790 mice from different strins http://phenome.jx.org All existing lbortory Mus musculus ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 Sttisticl dt nlysis in Excel 38

SAMPLING DISTRIBUTION Exmple: Mking Rndom Smpling mice.xls 790 mice from different strins http://phenome.jx.org ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 1. Add column to tble 2. Fill it with =RAND() 3. Sort ll tble by this column 4. Assume tht se mice is popultion with size N=790. Build 3 smples with n=20 5. Clculte m, s for ending weight nd p proportion mles for ech smple Point Point estimtor estimtor The The smple smple sttistic, sttistic, such such s s m, m, s, s, or or p, p, tht tht provides provides point point estimtion estimtion popultion popultion prmeters prmeters µ, µ, σ, σ, π. π. Sttisticl dt nlysis in Excel 39

SAMPLING DISTRIBUTION Smpling Distribution Smpling Smpling distribution distribution A probbility probbility distribution distribution consisting consisting ll ll possible possible vlues vlues smple smple sttistic. sttistic. Distribution m Distribution p E(m) = µ Density 0.00 0.10 0.20 Density 0.0 1.0 2.0 3.0 E( p) = π 20 25 30 N = 100000 Bndwidth = 0.1424 0.0 0.2 0.4 0.6 0.8 1.0 N = 100000 Bndwidth = 0.03 σ = m σ n σ p = π ( 1 π ) n Sttisticl dt nlysis in Excel 40

SAMPLING DISTRIBUTION Centrl Limit Theorem Centrl Centrl limit limit orem orem In In selecting selecting simple simple rndom rndom smple smple size size n from from popultion, popultion, smpling smpling distribution distribution smple smple men men m cn cn be be pproximted pproximtedby by norml norml distribution distributions s smple smple size size becomes becomes lrge lrge In prctice if smple size is n>30,, norml distribution is good pproximtion for smple men for ny initil distribution. Sttisticl dt nlysis in Excel 41

SAMPLING METHODS Strtified Smpling Strtified Strtified rndom rndom smpling smpling A probbility probbility smpling smpling method method in in which which popultion popultion is is first first divided divided into into strt strt nd nd simple simple rndom rndom smple smple is is n n tken tken from from ech ech strtum. strtum. Strt Smple Scientific Institution with 250 coworkers Administrtive Administrtive bord bord 20 20 people people Reserchers Reserchers 100 100 people people Engineers Engineers 50 50 people people Technicins Technicins 50 50 people people Students Students 30 30 people people 2 2 10 10 5 5 5 5 3 3 Sttisticl dt nlysis in Excel 42

SAMPLING METHODS Cluster Smpling Cluster Cluster smpling smpling A probbility probbility smpling smpling method method in in which which popultion popultion is is first first divided divided into into clusters clusters nd nd n n simple simple rndom rndom smple smple clusters clusters is is tken. tken. Clusters Luxembourg-cmp Luxembourg-cmp Esch-sur-Alzette Esch-sur-Alzette Remich Remich Diekirch Diekirch Smple Smple Mersch Mersch Rednge Rednge etc. Sttisticl dt nlysis in Excel 43

SAMPLING METHODS Systemtic Smple Systemtic Systemtic smpling smpling A probbility probbility smpling smpling method method in in which which we we rndomly rndomly select select one one first first kk elements elements nd nd n n select select every every k-th k-lement refter. refter. 1 2 3 4 5 6 7 8 9 10 11 12 98 99 100 101 Smple 1 11 21 31 41 51 61 71 81 91 101 Sttisticl dt nlysis in Excel 44

SAMPLING METHODS Convenience Smpling Convenience Convenience smpling smpling A nonprobbility nonprobbilitymethod smpling smpling whereby whereby elements elements re re selected selected for for smple smple on on bsis bsis convenience. convenience. Sttisticl dt nlysis in Excel 45

SAMPLING METHODS Judgment Smpling Judgment Judgment smpling smpling A nonprobbility nonprobbilitymethod smpling smpling whereby whereby elements elements re re selected selected for for smple smple bsed bsed on on judgment judgment person person doing doing study. study. Perform selection most confident or most experienced experts. Sttisticl dt nlysis in Excel 46

AN EXAMPLE Be Creful with Smpling!!! Spitfire: nlysis dmge Were to put dditionl protection? Sttisticl dt nlysis in Excel 47

INTERVAL ESTIMATION Intervl estimtion intervl estimtion popultion men: σ known popultion proportion popultion men: σ unknown Student s distribution estimtion size smple Sttisticl dt nlysis in Excel 48

POPULATION AND SAMPLE Prmeters Popultion Popultion prmeter prmeter A numericl numericl vlue vlue used used s s summry summry mesure mesure for for popultion popultion (e.g., (e.g., men men µ, µ, vrince vrince σ 2 σ, 2, stndrd stndrd devition devition σ, σ, proportion proportion π) π) POPULATION µ men σ 2 vrince N number elements (usully N= ) SAMPLE x m, men s 2 vrince n number elements Smple Smple sttistic sttistic A numericl numericl vlue vlue used used s s summry summry mesure mesure for for smple smple (e.g., (e.g., smple smple men men m, m, smple smple vrince vrince s s 2, 2, nd nd smple smple stndrd stndrd devition devition s) s) mice.txt 790 mice from different strins http://phenome.jx.org All existing lbortory Mus musculus ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 Sttisticl dt nlysis in Excel 49

INTERVAL ESTIMATION Intervl Estimtion Intervl Intervl estimte estimte An An estimte estimte popultion popultion prmeter prmeter tht tht provides provides n n intervl intervl believed believed to to contin contin vlue vlue prmeter. prmeter. For For intervl intervl estimtes estimtes in in this this chpter, chpter, it it hs hs form: form: point point estimte estimte ± ± mrgin mrgin error. error. Mrgin Mrgin error error The The ± ± vlue vlue dded dded to to nd nd subtrcted subtrcted from from point point estimte estimte in in order order to to develop develop n n intervl intervl estimte estimte popultion popultion prmeter. prmeter. Density 0.00 0.10 0.20 Distribution m 20 25 30 µ = m ± merror N = 100000 Bndwidth = 0.1424 σσ known known The The condition condition existing existing when when historicl historicl dt dt or or or or informtion informtion provides provides good good vlue vlue for for popultion popultion stndrd stndrd devition devition prior prior to to tking tking smple. smple. The The intervl intervl estimtion estimtion procedure procedure uses uses this this known known vlue vlue о о in in computing computing mrgin mrgin error. error. σσ unknown unknown The The condition condition existing existing when when no no good good bsis bsis exists exists for for estimting estimting popultion popultion stndrd stndrd devition devition prior prior to to tking tking smple. smple. The The intervl intervl estimtion estimtion procedure procedure uses uses smple smple stndrd stndrd devition devition s s in in computing computing mrgin mrgin error. error. Sttisticl dt nlysis in Excel 50

INTERVAL ESTIMATION Popultion Men: σ Known Sttisticl dt nlysis in Excel 51

INTERVAL ESTIMATION Popultion Men: σ Known Confidence Confidence level level The The confidence confidence ssocited ssocited with with n n intervl intervl estimte. estimte. For For exmple, exmple, if if n n intervl intervl estimtion estimtion procedure procedure provides provides intervls intervls such such tht tht 95% 95% intervls intervls formed formed using using procedure procedure will will include include popultion popultion prmeter, prmeter, intervl intervl estimte estimte is is sid sid to to be be constructed constructed t t 95% 95% confidence confidence level. level. Confidence Confidence intervl intervl Anor Anor nme nme for for n n intervl intervl estimte. estimte. = m ± z µ α / 2 σ n For 95 % confidence α = 0.05, which mens tht in ech til we hve 0.025. Corresponding z α/2 = 1.96 In Excel use one following functions: = CONFIDENCE(lph, σ, n) = -NORMINV(lph/2,0,1)*σ/SQRT(n) α/2 = 0.95 α/2 = 0.025 0.025 Sttisticl dt nlysis in Excel 52

INTERVAL ESTIMATION Popultion Proportion σ p π ( 1 π ) = n π = p ± zα / 2 σ p = p(1 p) n p( 1 p) n if np 5 5 nd n(1-p) 5 Density 0.0 1.0 2.0 3.0 Smpling distribution for Distribution proportion p π 0.0 0.2 0.4 0.6 0.8 1.0 N = 100000 Bndwidth = 0.03 pncretitis.txt n= 270 p(never)= 0.214815 sp= 0.024994 E= 0.048988 Prcticl Work Define Define 95% 95% confidence confidence intervl intervl for for never-smoking never-smoking proportion proportion people people coming coming to to hospitl hospitl π π = 21.5 21.5 ± ± 4.9 4.9 % for 95% confidence z 0.025 = 1.96 Sttisticl dt nlysis in Excel 53

INTERVAL ESTIMATION Popultion Proportion: Some Prcticl Aspects π = p ± zα / 2 p(1 p) n 1. The norml distribution is pplicble only when enough dt points re observed. The rule thumb is: np 5 5 nd n(1-p) 5 p(1-p) p(1-p) 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 p p 2. The mximl mrginl error is observed when p=0.5 3. The estimtion smple size cn be obtined: n 2 z / 2 p(1 p) = α 2 E np 5 5 nd n(1-p) 5 Mrginl Error Mrginl Error 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 p p where p is best guess for π or result preliminry study Sttisticl dt nlysis in Excel 54

Weight 39.9 19.8 32.4 21 27.5 20.8 21.3 40 10.7 22.6 27 10.8 20.9 14.7 31.4 17.2 11.4 19.1 31.3 14.8 INTERVAL ESTIMATION Popultion Men: σ Unknown Assume tht we hve smple 20 mice nd would like to estimte te n verge size mice in popultion. m = 22.73 s = 8.84 σ m = σ n s n As we replce σ s, we introduce n dditionl error nd this chnge distribution from z to t (Student) Note: not relistic scle here for illustrtion only Sttisticl dt nlysis in Excel 55

INTERVAL ESTIMATION Popultion Men: σ Unknown t-distribution t-distribution A fmily fmily probbility probbility distributions distributions tht tht cn cn be be used used to to develop develop n n intervl intervl estimte estimte popultion popultion men men whenever whenever popultion popultion stndrd stndrd devition devition σσ is is unknown unknown nd nd is is estimted estimted by by smple smple stndrd stndrd devition devition s. s. Degrees Degrees freedom freedom A prmeter prmeter t-distribution. t-distribution. When When ttdistribution distribution is is used used in in computtion computtion n n intervl intervl estimte estimte popultion popultion men, men, pproprite pproprite ttdistribution distribution hs hs n 1 degrees degrees freedom, freedom, where where n is is size size simple simple rndom rndom smple. smple. Sttisticl dt nlysis in Excel 56

INTERVAL ESTIMATION Popultion Men: σ Unknown Weight 39.9 19.8 32.4 21 27.5 20.8 21.3 40 10.7 22.6 27 10.8 20.9 14.7 31.4 17.2 11.4 19.1 31.3 14.8 m = 22.73 s = 8.84 s(m) = 1.98 t = 2.09 m.e. = 4.14 In Excel use: µ = m ± tα ( n 1) / 2 s n α/2 = 0.95 α/2 = 0.025 0.025 = TINV(lph, degree--freedom)!!! Sttisticl dt nlysis in Excel 57

INTERVAL ESTIMATION Popultion Men: Prcticl Advices Advice 1 Popultion ( µ = m ± tα / n 1) 2 s n not norml norml symmetric skewed highly skewed ny n n ~ 10 n ~ 30 n 50 Advice 2 if if n >100 >100 you you cn cn use use z-sttistics z-sttistics insted insted t-sttistics t-sttistics (error (error will will be be <1.5%) <1.5%) Sttisticl dt nlysis in Excel 58

Sttisticl dt nlysis in Excel 59 INTERVAL ESTIMATION Determining Smple Size Let Let s focus on nor spect: how to select proper number expe s focus on nor spect: how to select proper number experiments. riments.? ), ( ), ( = ± = n E n E n E m σ σ µ 2 2 2 2 / 2 / E z n n z E σ σ α α = = 2 2 2 2 / E z n σ α = 0 5 10 15 20 0 1 2 3 4 n n-dependent prt conf. intervl Effect Smple Size

QUESTIONS? Thnk you for your ttention to be continued Sttisticl dt nlysis in Excel 60