BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Similar documents
STATISTICAL DATA ANALYSIS IN EXCEL

Clinical Study Report Synopsis Drug Substance Naloxegol Study Code D3820C00018 Edition Number 1 Date 01 February 2013 EudraCT Number

Single-Molecule Studies of Unlabelled Full-Length p53 Protein Binding to DNA

ENERGY CONTENT OF BARLEY

Community. Profile Yellowstone County. Public Health and Safety Division

Community. Profile Lewis & Clark County. Public Health and Safety Division

Community. Profile Missoula County. Public Health and Safety Division

2. Hubs and authorities, a more detailed evaluation of the importance of Web pages using a variant of

The Measurement of Interviewer Variance

Community. Profile Powell County. Public Health and Safety Division

Community. Profile Big Horn County. Public Health and Safety Division

Community. Profile Anaconda- Deer Lodge County. Public Health and Safety Division

Finite-Dimensional Linear Algebra Errata for the first printing

Community. Profile Carter County. Public Health and Safety Division

Body mass index, waist-to-hip ratio, and metabolic syndrome as predictors of middle-aged men's health

EFFECTS OF INGREDIENT AND WHOLE DIET IRRADIATION ON NURSERY PIG PERFORMANCE

Lipase and Pancreatic Amylase Activities in Tissues and in Patients with Hyperamylasemia

Impact of Positive Nodal Metastases in Patients with Thymic Carcinoma and Thymic Neuroendocrine Tumors

XII. HIV/AIDS. Knowledge about HIV Transmission and Misconceptions about HIV

8/1/2017. Correlating Radiomics Information with Clinical Outcomes for Lung SBRT. Disclosure. Acknowledgements

Reducing the Risk. Logic Model

Assessment of Depression in Multiple Sclerosis. Validity of Including Somatic Items on the Beck Depression Inventory II

Opioid Use and Survival at the End of Life: A Survey of a Hospice Population

Math 254 Calculus Exam 1 Review Three-Dimensional Coordinate System Vectors The Dot Product

Analysis of Regulatory of Interrelated Activity of Hepatocyte and Hepatitis B Viruses

unit 1.9 Problems with unknown (II) 15 days D Figures within days 1 G 1 1 OA 1 1 OA 1 1 OA 1 1 G 2 1 OA 6 1 OA 4 1 OA 4 1 G 3 1 OA 7

Geographical influence on digit ratio (2D:4D): a case study of Andoni and Ikwerre ethnic groups in Niger delta, Nigeria.

Utilization of dental services in Southern China. Lo, ECM; Lin, HC; Wang, ZJ; Wong, MCM; Schwarz, E

Supplementary Online Content

Comparison of three simple methods for the

Muhammad Shoaib, Muhammad Usman, Rabia Fatima, Sajid Aziz, Muhammad Wasif Malik, Muhammad Javaid Asad and Sikandar Khan Sherwani

Report of the Conference on Low Blood

Feeding state and age dependent changes in melaninconcentrating hormone expression in the hypothalamus of broiler chickens

Invasive Pneumococcal Disease Quarterly Report. July September 2017

Water fl uoridation and dental caries in 5- and 12-year-old children from Canterbury and Wellington

Quantifying perceived impact of scientific publications

Analytic hierarchy process-based recreational sports events development strategy research

Appendix J Environmental Justice Populations

Risks for All-Cause Mortality: Stratified by Age, Estimated Glomerular Filtration Rate and Albuminuria

Digestible Sulfur Amino Acid Requirement of Male Turkeys During the 12 to 18 Week Period

Using Load Research Data to Model Weather Response

Supplementary Online Content

ORIGINAL ARTICLE. Diagnostic Signs of Accommodative Insufficiency. PILAR CACHO, OD, ÁNGEL GARCÍA, OD, FRANCISCO LARA, OD, and M A MAR SEGUÍ, OD

DIFFERENTIAL REINFORCEMENT OF VOCAL DURATION1

Soybean Hulls as an Alternative Feed for Horses

Chapter 02 Crime-Scene Investigation and Evidence Collection

APA CENTENNIAL FEATURE. Studies of Interference in Serial Verbal Reactions

The step method: A new adaptive psychophysical procedure

The Mid-Depth Method and HIV-1: A Practical Approach for Testing Hypotheses of Viral Epidemic History

HEMOGLOBIN STANDARDS*

CheckMate 153: Randomized Results of Continuous vs 1-Year Fixed-Duration Nivolumab in Patients With Advanced Non-Small Cell Lung Cancer

Scientific research on the biological value of olive oil

The accuracy of creatinine clearance with and without

Effectiveness of Belt Positioning Booster Seats: An Updated Assessment

THE EVALUATION OF DEHULLED CANOLA MEAL IN THE DIETS OF GROWING AND FINISHING PIGS

Paper-based skin patch for the diagnostic screening of cystic fibrosis

SYNOPSIS Final Abbreviated Clinical Study Report for Study CA ABBREVIATED REPORT

The Effect of Substituting Sugar with Artificial. Sweeteners on the Texture and Palatability of Pancakes

3.3 Verotoxigenic E. coli

Satoshi Yoshida* and Takuya Kida* *Hokkaido University Graduate school of Information Science and Technology Division of Computer Science

WORKSHOP FOR SYRIA. A SHORT TERM PROJECT A Collaborative Map proposal Al Moadamyeh, Syria

Rates of weight change for black and white Americans over a twenty year period

Esthetic Influence of Negative Space in the Buccal Corridor during Smiling

EVALUATION OF DIFFERENT COPPER SOURCES AS A GROWTH PROMOTER IN SWINE FINISHING DIETS 1

Teacher motivational strategies and student self-determination in physical education

RADIATION RESEARCH 158, (2002) /02 $ by Radiation Research Society. All rights of reproduction in any form reserved.

The relationship between women s subjective and physiological sexual arousal

Estimating the impact of the 2009 influenza A(H1N1) pandemic on mortality in the elderly in Navarre, Spain

Serum γ-glutamyltransferase: Independent Predictor of Risk of Diabetes, Hypertension, Metabolic Syndrome, and Coronary Disease

Area-Level Socioeconomic Disadvantage and Severe Pulmonary Tuberculosis: U.S.,

Human protein requirements: nitrogen balance response to graded levels of egg protein in elderly men and women1 2

SROC Curve. S. D. Walter McMaster University, Hamilton, Ontario, Canada. Petra Macaskill University of Sydney, NSW, Australia INTRODUCTION

Original Article INTRODUCTION. Korean Diabetes J 2010;34: doi: /kdj pissn eissn

Longitudinal Association of Maternal Attempt to Lose Weight During the Postpartum Period and Child Obesity at Age 3 Years

URINARY incontinence is an important and common

BMI and Mortality: Results From a National Longitudinal Study of Canadian Adults

Checks on inadvertently modified BAS-funseeking scale from BIS-BAS. Modified scale excluded 2 of the original 4 items: bisbas10, bisbas20.

Time trends in repeated spirometry in children

Rheumatoid-susceptible alleles of HLA-DRB 1 are genetically recessive to non-susceptible alleles in the progression of bone destruction in the wrists

Studies of the Mortality of Atomic Bomb Survivors, Report 14, : An Overview of Cancer and Noncancer Diseases

Using proliferative markers and Oncotype DX in therapeutic decision-making for breast cancer: the B.C. experience

DXA: Can It Be Used as a Criterion Reference for Body Fat Measurements in Children?

Factors influencing help seeking in mentally distressed young adults: a cross-sectional survey

Staffing Model for Dental Wellness and Readiness

Comparison of autologous peripheral blood stem cell dosing by ideal vs actual body weight

3. DRINKING WATER INTAKE BACKGROUND KEY GENERAL POPULATION STUDIES ON DRINKING WATER INTAKE RELEVANT GENERAL POPULATION

Agilent G6825AA MassHunter Pathways to PCDL Software Quick Start Guide

27 June Bmnly L. WALTER ET AL.: RESPONSE OF CERVICAL CANCERS TO IRRADIATION

Small Rice Bowl-Based Meal Plan for Energy and Marcronutrient Intake in Korean Men with Type 2 Diabetes: A Pilot Study

Original Research Article. Dement Geriatr Cogn Disord 2007;24: DOI: /

Metformin and breast cancer stage at diagnosis: a population-based study

SUPPLEMENTARY INFORMATION

Analysis of Risk Factors for the Development of Incisional and Parastomal Hernias in Patients after Colorectal Surgery

Relationship Between Hospital Performance on a Patient Satisfaction Survey and Surgical Quality

An Energy Efficient Seizure Prediction Algorithm

M Sandström 1, MO Karlsson 1, P Ljungman 2, Z Hassan 3, EN Jonsson 1, C Nilsson 3, O Ringden 4, GÖberg 5, A Bekassy 6 and M Hassan 3.

SUPPLEMENTARY INFORMATION

Transcription:

Microrry Center BIOSTATISTICS Lecture 1 Dt Presenttion Descriptive Sttistics dr. Petr Nzrov 25-02-2011 petr.nzrov@crp-snte.lu Lecture 1. Dt presenttion descriptive sttistics

COURSE OVERVIEW Orgniztion Orgniztion 10 sessions = 9 mes + 1 finlizing (30 hours in totl) 1 session = 1 hr lecture mixed with 2 hr prcticl work 3 intermedite tests + finlizing exm (solving tsks) Pln cn be found on moodle. However it my be corrected to fit level group (especilly, modeling prt will be moved furr) Gol: your FINAL knowledge skills in biologicl nlysis! not reltively fir mrk for your work Microst Excel Stwre with Dt Anlysis Add-In instlled Dt http://edu.sblb.net//xls Lecture 1. Dt presenttion descriptive sttistics 2

COURSE OVERVIEW Recommended Literture Recommended Literture presenttion methodology Lecture 1. Dt presenttion descriptive sttistics 3

COURSE OVERVIEW Introduction Any biologicl study where numbers re mesured or reported Drug discovery BIOSTATISTICS: why where? Genomics systems biology Lecture 1. Dt presenttion descriptive sttistics Public helth 4

OUTLINE Lecture 1 Dt sttistic elements, vribles observtion types (qulittive quntittive) scles (nominl, ordinl, intervl, rtio) Descriptive sttistics: tbulr grphicl presenttion frequency distribution pie, br chrt histogrm representtion cumultive distributions crosstbultion sctter digrm Descriptive sttistics: numericl mesures mesures loction: men, mode, medin, quntiles/qurtiles/percentiles mesure vribility: vrince, strd devition, MAD, coefficient vrition or mesures: skewness distribution z-score. Chebyshev's orem. Detection outliers. Explortory nlysis. 5 number summry box plot Mesure ssocition between two vribles covrince correltion coefficient interprettion correltion coefficient Lecture 1. Dt presenttion descriptive sttistics 5

DATA AND STATISTICS Elements, vribles, observtions, scles types Lecture 1. Dt presenttion descriptive sttistics 6

DATA AND STATISTICS Dt: Elements, Vribles, Observtions Dt Dt The The fcts fcts figures figures collected, nlyzed, summrized for for presenttion interprettion. elements vribles observtion Person Plce Gender Net Worth ($BIL) Age Source Internet Fme Score Willim Gtes III 1 M 40 53 Microst 9.5 Wrren Buffett 2 M 37 79 Berkshire Hthwy 6.6 Crlos Slim Helu 3 M 35 69 telecom 2.1 Lwrence Ellison 4 M 22.5 64 Orcle 2.8 Ingvr Kmprd 5 M 22 83 IKEA 2.4 Krl Albrecht 6 M 21.5 89 Aldi 3.6 Mukesh Ambni 7 M 19.5 51 petrochemicls 4.4 Lkshmi Mittl 8 M 19.3 58 steel 5.4 Theo Albrecht 9 M 18.8 87 Aldi 1.5 Amncio Orteg 10 M 18.3 73 Zr 1.9 Jim Wlton 11 M 17.8 61 Wl-Mrt 3.9 Alice Wlton 12 F 17.6 59 Wl-Mrt 2.9 Cn we consider Plce s element? ( log 4.5) IFS = 3 10 N Lecture 1. Dt presenttion descriptive sttistics 7

DATA AND STATISTICS Dt Scles Types Dt Dt scles: Qulittive Nominl scle scle use use lbels lbels or or nmes nmes to to identify identify n n ttribute ttribute n n element. element. Ordinl scle scle exhibit exhibit properties properties nominl nominl order order or or rnk rnk is is meningful. meningful. Ex.1: Ex.2: Ex.1: Ex.2: Mle, Femle Rooms #: 101, 102, 103, Winners: The 1 st, 2 nd, 3 rd plces Mrks: A, B, C, Quntittive Intervl scle scle demonstrte demonstrte properties properties ordinl ordinl intervl intervl between between vlues vlues is is expressed expressed in in terms terms fixed fixed unit unit mesure mesure Rtio Rtio scle scle demonstrte demonstrte ll ll properties properties intervl intervl rtio rtio two two vlues vlues is is meningful. meningful. Ex.1: Exmintion score 0-100 Ex.2: Internet fme score Ex.1: Weight Ex.2: Price Lecture 1. Dt presenttion descriptive sttistics 8

DATA AND STATISTICS Tsk: Define Scles Person Plce Gender Net Worth ($BIL) Age Source Internet Fme Score Willim Gtes III 1 M 40 53 Microst 9.5 Wrren Buffett 2 M 37 79 Berkshire Hthwy 6.6 Crlos Slim Helu 3 M 35 69 telecom 2.1 Lwrence Ellison 4 M 22.5 64 Orcle 2.8 Ingvr Kmprd 5 M 22 83 IKEA 2.4 Krl Albrecht 6 M 21.5 89 Aldi 3.6 Mukesh Ambni 7 M 19.5 51 petrochemicls 4.4 Lkshmi Mittl 8 M 19.3 58 steel 5.4 Theo Albrecht 9 M 18.8 87 Aldi 1.5 Amncio Orteg 10 M 18.3 73 Zr 1.9 Jim Wlton 11 M 17.8 61 Wl-Mrt 3.9 Alice Wlton 12 F 17.6 59 Wl-Mrt 2.9 ( log 4.5) IFS = 3 10 N? Lecture 1. Dt presenttion descriptive sttistics 9

TABULAR AND GRAPHICAL PRESENTATION Frequency distribution, br pie chrts, histogrm, cumultive frequency distribution, sctter plot Lecture 1. Dt presenttion descriptive sttistics 10

TABULAR AND GRAPHICAL PRESENTATION Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Mrks A B C B A B B A B C Frequency distribution: Mrk Frequency A 3 B 5 C 2 Totl 10 Reltive frequency distribution: Mrk Frequency A 0.3 B 0.5 C 0.2 Totl 1 In MS Excel use following functions: Percent frequency distribution: =COUNTIF(,element) to get number elements found in re =SUM() to get sum vlues in re Mrk Frequency A 30% B 50% C 20% Totl 100% Lecture 1. Dt presenttion descriptive sttistics 11

TABULAR AND GRAPHICAL PRESENTATION Exmple: Pncretitis Study The role smoking in etiology pncretitis hs been recognized for mny yers. To provide estimtes quntittive significnce se fctors, hospitl-bsed study ws crried out in estern Msschusetts Rhode Isl between 1975 1979. 53 ptients who hd hospitl dischrge dignosis pncretitis were included in this unmtched cse-control study. The control group consisted 217 ptients dmitted for diseses or thn those pncres biliry trct. Risk fctor informtion ws obtined from strdized interview with ech subject, conducted by trined interviewer. dpted from Chp T. Le, Introductory Biosttistics pncretitis.xls Pncretitis ptients: Smokers Ex-smokers Ex-smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Never Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Never Smokers Smokers Smokers Lecture 1. Dt presenttion descriptive sttistics 12

FREQUENCY DISTRIBUTION Reltive Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Reltive Reltive frequency frequency distribution distribution A tbulr tbulr summry summry showing showing frction frction or or proportion proportion items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Sum Sum ll ll vlues vlues should should give give 1 Estimtion Estimtion probbility probbility distribution distribution When When number number experiments experiments n,, R.F.D. R.F.D. P.D. P.D. pncretitis.txt Frequency distribution: Smoking Cses Controls Never 2 56 Ex-smokers 13 80 Smokers 38 81 Totl 53 217 Reltive frequency distribution: Smoking Cses Controls Never 0.038 0.258 Ex-smokers 0.245 0.369 Smokers 0.717 0.373 Totl 1 1 In Excel use following functions: =COUNTIF(,element) to get number elements found in re =SUM() to get sum vlues in re Lecture 1. Dt presenttion descriptive sttistics 13

TABULAR AND GRAPHICAL PRESENTATION Br Pie Chrts pncretitis.xls 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Never Ex-smokers Smokers Never Ex-smokers Smokers Pncretitis Pncretitis Control Control Pncretitis Pncretitis Never Never Ex-smokers Ex-smokers Smokers Smokers Control Control Try to void using in scientific reports. For public/business presenttions only! Never Never Ex-smokers Ex-smokers Smokers Smokers In MS Excel use following steps: Chrt Wizrd Columns Set rnge (both columns Percent freq. distribution) Chrt Wizrd Pie Set rnge (one columns Percent freq. distribution) Lecture 1. Dt presenttion descriptive sttistics 14

TABULAR AND GRAPHICAL PRESENTATION Crosstbultion pncretitis.xls Disese Smoking or pncretitis Totl Ex-smokers 80 13 93 Never 56 2 58 Smokers 81 38 119 Totl 217 53 270 In Excel use following steps: Dt Pivot Tble PivotChrt MS Office list + Pivot Tble Set rnge, including heders Select output set lyout by drg--dropping nmes into tble Lecture 1. Dt presenttion descriptive sttistics 15

TABULAR AND GRAPHICAL PRESENTATION Exmple: Mice Dt Series Tordf Tordf MG, MG, Bchmnov Bchmnov AA AA Survey clcium & sodium intke Survey clcium & sodium intke metbolism metbolism with with bone bone body body composition composition Project Project symbol: symbol: Tordf3 Tordf3 Accession Accession number: number: MPD:103 MPD:103 mice.xls 790 mice from different strins http://phenome.jx.org prmeter Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight Lecture 1. Dt presenttion descriptive sttistics 16

The following re weights in grms for 970 mice: TABULAR AND GRAPHICAL PRESENTATION Histogrm mice.xls 20.5 23.2 24.6 23.5 26 25.9 23.9 22.8 19.9 20.8 22.4 26 23.8 26.5 26 22.8 22.9 20.9 19.8 22.7 31 22.7 26.3 27.1 18.4 21 18.8 21 21.4 25.7 19.7 27 26.2 21.8 22.2 19.2 21.9 22.6 23.7 26.2 26 27.5 25 20.9 20.6 22.1 20 21.1 24.1 28.8 30.2 20.1 24.2 25.8 21.3 21.8 23.7 23.5 28 27.6 21.6 21 21.3 20.1 20.8 24.5 23.8 29.5 21.4 21.5 24 21.1 18.9 19.5 32.3 28 27.1 28.2 22.9 19.9 20.4 21.3 20.6 22.8 25.8 24.1 23.5 24.2 22 20.3 Sorted weights show tht vlues re in 10 49.6 grms. Let us divide weight into bins bins bins Weight,g Frequency >=10 1 10-20 237 20-30 417 30-40 124 40-50 11 More 0 Lecture 1. Dt presenttion descriptive sttistics 17

TABULAR AND GRAPHICAL PRESENTATION Histogrm Now, let us use bin-size = 1 grm Bin Frequency 10 1 11 13 12 12 13 25 14 29 46 1 47 1 48 0 49 1 50 1 More 0 In Excel use following steps: Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Specify column bins (intervl) upper-limits Tools Dt Anlysis Histrogrm select input, bins, output (Anlysis ToolPk should be instlled) use Chrt Wizrd Columns to visulize results Lecture 1. Dt presenttion descriptive sttistics 18

TABULAR AND GRAPHICAL PRESENTATION Cumultive Frequency Distribution Cumultive Cumultive frequency frequency distribution distribution A tbulr tbulr summry summry quntittive quntittive showing showing number number items items with with vlues vlues less less thn thn or or equl equl to to upper upper clss clss limit limit ech ech clss. clss. Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Cumultive reltive frequency Ogive 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 20 30 40 50 Weight, g Lecture 1. Dt presenttion descriptive sttistics 19

TABULAR AND GRAPHICAL PRESENTATION Sctter Plot mice.xls Let us look on mutul dependency Strting Ending weights. 50 45 40 35 Ending weight 30 25 20 15 In Excel use following steps: Select region Use Chrt Wizrd XY (Sctter) 10 5 0 0 5 10 15 20 25 30 35 40 45 50 Strting weight Lecture 1. Dt presenttion descriptive sttistics 20

NUMERICAL MEASURES Popultion smple, mesures loction, quntiles, qurtiles percentiles, mesures vribility, z-score, detection outliers, explortion nlysis, box plot, covrition, correltion Lecture 1. Dt presenttion descriptive sttistics 21

NUMERICAL MEASURES Popultion Smple Popultion Popultion prmeter prmeter A numericl numericl vlue vlue used used s s summry summry mesure mesure for for popultion popultion (e.g., (e.g., popultion popultion men men µ, µ, vrince vrince σ 2 σ, 2, strd strd devition devition σ) σ) POPULATION µ men σ 2 vrince N number elements (usully N= ) SAMPLE x m, men s 2 vrince n number elements Smple Smple sttistic sttistic A numericl numericl vlue vlue used used s s summry summry mesure mesure for for smple smple (e.g., (e.g., smple smple men men m, m, smple smple vrince vrince s s 2, 2, smple smple strd strd devition devition s) s) mice.xls 790 mice from different strins http://phenome.jx.org All existing lbortory Mus musculus ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 Lecture 1. Dt presenttion descriptive sttistics 22

NUMERICAL MEASURES Mesures Loction Men Men A mesure mesure centrl centrl loction loction computed computed by by summing summing vlues vlues dividing dividing by by number number observtions. observtions. x = µ = p = m = x i N n x ( x true) = i n i Medin Medin A mesure mesure centrl centrl loction loction provided provided by by vlue vlue in in middle middle when when re re rrnged rrnged in in scending scending order. order. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Mode Mode A mesure mesure loction, loction, defined defined s s vlue vlue tht tht occurs occurs with with gretest gretest frequency. frequency. Mode = 23 Medin = 23.5 Men = 31.7 Lecture 1. Dt presenttion descriptive sttistics 23

NUMERICAL MEASURES Mesures Loction mice.xls Histogrm p.d.f. pproximtion medinmen mode Femle proportion p f = 0.501 Density 0.00 0.02 0.04 0.06 10 15 20 25 30 35 40 weight, g Bleeding time In Excel use following functions: = AVERAGE() = MEDIAN() = MODE() Density 0.000 0.010 0.020 medin = 55 men = 61 mode = 48 0 50 100 150 200 N = 760 Bwidth = 5.347 Lecture 1. Dt presenttion descriptive sttistics 24

NUMERICAL MEASURES Quntiles, Qurtiles Percentiles Percentile Percentile A vlue vlue such such tht tht t t lest lest p% p% observtions observtions re re less less thn thn or or equl equl to to this this vlue, vlue, t t lest lest (100-p)% (100-p)% observtions observtions re re greter greter thn thn or or equl equl to to this this vlue. vlue. The The 50-th 50-th percentile percentile is is medin. medin. Qurtiles Qurtiles The The 25th, 25th, 50th, 50th, 75th 75th percentiles, percentiles, referred referred to to s s first first qurtile, qurtile, second second qurtile qurtile (medin), (medin), third third qurtile, qurtile, respectively. respectively. In Excel use following functions: =PERCENTILE(,p) Weight 12 16 19 22 23 23 24 32 36 42 63 68 Q 1 = 21 Q 2 = 23.5 Q 3 = 39 Lecture 1. Dt presenttion descriptive sttistics 25

NUMERICAL MEASURES Mesures Vribility Interqurtile Interqurtile rnge rnge (IQR) (IQR) A mesure mesure vribility, vribility, defined defined to to be be difference difference between between third third first first qurtiles. qurtiles. Vrince Vrince A mesure mesure vribility vribility bsed bsed on on squred squred devitions devitions vlues vlues bout bout men. men. Strd Strd devition devition A mesure mesure vribility vribility computed computed by by tking tking positive positive squre squre root root vrince. vrince. IQR = Q 3 Q 1 popultion smple s σ 2 x i = N ( ) 2 µ 2 ( x m) i = n 1 2 Smple strd devition = s = Popultion strd devition = σ = 2 s 2 σ Weight 12 16 19 22 23 23 24 32 36 42 63 68 IQR = 18 Vrince = 320.2 St. dev. = 17.9 In Excel use following functions: =VAR(), =STDEV() Lecture 1. Dt presenttion descriptive sttistics 26

NUMERICAL MEASURES Mesures Vribility Coefficient Coefficient vrition vrition A mesure mesure reltive reltive vribility vribility computed computed by by dividing dividing strd strd Strd devition devition devition by by men. 100 % men. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Men CV = 57% Medin Medin bsolute bsolute devition devition (MAD) (MAD) MAD MAD is is robust robust mesure mesure vribility vribility univrite univrite smple smple quntittive quntittive.. MAD = medin ( x medin( x) ) i Set 1 Set 2 23 23 12 12 22 22 12 12 21 21 18 81 22 22 20 20 12 12 19 19 14 14 13 13 17 17 Set 1 Set 2 Men 17.3 22.2 Medin 18 19 St.dev. 4.23 18.18 MAD 5.93 5.93 Lecture 1. Dt presenttion descriptive sttistics 27

NUMERICAL MEASURES Mesures Vribility Skewness Skewness A mesure mesure shpe shpe distribution. distribution. Dt Dt skewed skewed to to left left result result in in negtive negtive skewness; skewness; symmetric symmetric distribution distribution results results in in zero zero skewness; skewness; skewed skewed to to right right result result in in positive positive skewness. skewness. Skewness = n x m ( )( ) i n 1 n 2 s i 3 dpted from Anderson et l Sttistics for Business Economics Lecture 1. Dt presenttion descriptive sttistics 28

NUMERICAL MEASURES z-score Detection Outliers z-score z-score A vlue vlue computed computed by by dividing dividing devition devition bout bout men men (x (x i - i - x) x) by by strd strd devition devition s. s. A z-score z-score is is referred referred to to s s strdized strdized vlue vlue denotes denotes number number strd strd devitions devitions x x i is i is from from men. men. Chebyshev s orem For For ny ny set, set, t t lest lest (1 (1 1/z 1/z 2 2 )) vlues vlues must must be be within within z strd devitions from from men, men, where where z ny ny vlue vlue > 1. 1. z i = x i m s Weight z-score 12-1.10 16-0.88 19-0.71 22-0.54 23-0.48 23-0.48 24-0.43 32 0.02 36 0.24 42 0.58 63 1.75 68 2.03 For ANY distribution: At lest 75 % vlues re within z = 2 strd devitions from men At lest 89 % vlues re within z = 3 strd devitions from men At lest 94 % vlues re within z = 4 strd devitions from men At lest 96% vlues re within z = 5 strd devitions from men Lecture 1. Dt presenttion descriptive sttistics 29

NUMERICAL MEASURES Detection Outliers For bell-shped distributions: Approximtely 68 % vlues re within 1 st.dev. from men Approximtely 95 % vlues re within 2 st.dev. from men Almost ll points re inside 3 st.dev. from men Outlier Outlier An An unusully unusully smll smll or or unusully unusully lrge lrge vlue. vlue. For For bell-shped bell-shped distributions distributions points points with with z >3 z >3 cn cn be be considered considered s s outliers. outliers. Exmple: Gussin distribution Weight z-score 23 0.04 12-0.53 22-0.01 12-0.53 21-0.06 81 3.10 22-0.01 20-0.11 12-0.53 19-0.17 14-0.43 13-0.48 17-0.27 Lecture 1. Dt presenttion descriptive sttistics 30

NUMERICAL MEASURES Tsk: Detection Outliers mice.xls Using Excel, try to identify outlier mice on bsis Weight chnge vrible z i = x i s m For For bell-shped bell-shped distributions distributions points points with with z >3 z >3 cn cn be be considered considered s s outliers. outliers. In Excel use following functions: = AVERAGE() - men, m = STDEV() - strd devition, s = bs() - bsolute vlue sort by z-scle to identify outliers Lecture 1. Dt presenttion descriptive sttistics 31

NUMERICAL MEASURES Explortion Dt Anlysis Five-number Five-number summry summry An An explortory explortory nlysis nlysis technique technique tht tht uses uses five five numbers numbers to to summrize summrize : : smllest smllest vlue, vlue, first first qurtile, qurtile, medin, medin, third third qurtile, qurtile, lrgest lrgest vlue vlue children.xls Min. Min. : : 12 12 Q 1 : 1 : 25 25 Medin: Medin: 32 32 Q 3 : 3 : 46 46 Mx. Mx. : : 79 79 In Excel use: Tool Dt Anlysis Descriptive Sttistics Box Box plot plot A grphicl grphicl summry summry bsed bsed on on five-number five-number summry summry Min Q 2 Q 1 Box Qplot 3 Mx In Excel use (indirect): Chrt Wizrd Stock Open-high-low-close open Q3 high Q3+1.5*IQR low Q1-1.5*IQR close Q1 1.5 IQR Lecture 1. Dt presenttion descriptive sttistics 32

NUMERICAL MEASURES Exmple: Mice Weight Exmple Exmple Build Build box box plot plot for for weights weights mle mle femle femle mice mice mice.xls 1. Build 5 number summries for mles femles Femle Mle Min 10.0 12.0 Q1 17.2 23.8 Q2 20.7 27.1 Q3 23.3 31.2 Mx 41.5 49.6 2. Combine numbers into following order open Q3 high Q3+min(1.5*(Q3-Q1),Mx) low Q1-mx(1.5*(Q3-Q1),Min) close Q1 Mouse weight In Excel use: Chrt Wizrd Stock Open-high-low-close Put series-in-rows Adjust colors, etc Weight, g 45 40 35 30 25 20 15 10 5 0 Femle Mle Lecture 1. Dt presenttion descriptive sttistics 33

NUMERICAL MEASURES Mesure Assocition between 2 Vribles Covrince Covrince A mesure mesure liner liner ssocition ssocition between between two two vribles. vribles. Positive Positive vlues vlues indicte indicte positive positive reltionship; reltionship; negtive negtive vlues vlues indicte indicte negtive negtive reltionship. reltionship. σ xy = popultion ( xi µ x )( yi µ y ) N s xy smple ( x x)( y y) = n 1 i i mice.xls Ending weight vs. Strting weight Ending weight 60 50 40 30 20 10 0 0 10 20 30 40 50 Strting weight In Excel use function: =COVAR() s xy = 39.8 hrd to interpret Lecture 1. Dt presenttion descriptive sttistics 34

NUMERICAL MEASURES Mesure Assocition between 2 Vribles Correltion Correltion (Person (Person product product moment moment correltion correltion coefficient) coefficient) A mesure mesure liner liner ssocition ssocition between between two two vribles vribles tht tht tkes tkeson on vlues vlues between between -1-1 +1. +1. Vlues Vlues ner ner +1 +1 indicte indicte strong strong positive positive liner liner reltionship, reltionship, vlues vlues ner ner -1-1 indicte indicte strong strong negtive negtive liner liner reltionship; reltionship; vlues vlues ner ner zero zero indicte indicte lck lck liner liner reltionship. reltionship. popultion ( x x)( y y) σ xy i i ρxy = = σ σ σ σ N x y x y r xy smple sxy = = s s x y ( xi x)( yi y) s s ( n 1) x y 60 Ending weight 50 40 30 20 10 In Excel use function: =CORREL() r xy = 0.94 0 0 10 20 30 40 50 Strting weight mice.xls Lecture 1. Dt presenttion descriptive sttistics 35

NUMERICAL MEASURES Correltion Coefficient If If we we hve hve only only 2 points points in in x x y y sets, sets, wht wht vlues vlues would would you you expect expect for for correltion correltion b/w b/w xx y y? Wikipedi Lecture 1. Dt presenttion descriptive sttistics 36

NUMERICAL MEASURES Weighted Men Weighted Weighted men men The The men men obtined obtined by by ssigning ssigning ech ech observtion observtion weight weight tht tht reflects reflects its its importnce importnce m = w x i w i i As n exmple need weighted men, consider following smple five purchses rw mteril over severl months Note tht cost per pound vries from $2.80 to $3.40, quntity purchsed hs vried from 500 to 2750. Suppose tht mnger sked for informtion bout men cost per pound rw mteril. If we would use simple men cost p.p.: we overestimte verge cost! Anderson et l Sttistics for Business Economics Lecture 1. Dt presenttion descriptive sttistics 37

NUMERICAL MEASURES Grouped Men Grouped Grouped Dt Dt vilble vilble in in clss clss intervls intervls s s summrized summrized by by frequency frequency distribution. distribution. Individul Individul vlues vlues originl originl re re not not vilble. vilble. not vilble children.xls Bin Frequency 20 5 30 21 40 8 50 14 60 3 70 4 80 2 More 0 Men for grouped m = k i f i n M i Vrince for grouped s 2 = k i f i ( M m) i n 1 2 Lecture 1. Dt presenttion descriptive sttistics 38

QUESTIONS? Thnk you for your ttention to be continued Lecture 1. Dt presenttion descriptive sttistics 39