Microrry Center BIOSTATISTICS Lecture 1 Dt Presenttion Descriptive Sttistics dr. Petr Nzrov 25-02-2011 petr.nzrov@crp-snte.lu Lecture 1. Dt presenttion descriptive sttistics
COURSE OVERVIEW Orgniztion Orgniztion 10 sessions = 9 mes + 1 finlizing (30 hours in totl) 1 session = 1 hr lecture mixed with 2 hr prcticl work 3 intermedite tests + finlizing exm (solving tsks) Pln cn be found on moodle. However it my be corrected to fit level group (especilly, modeling prt will be moved furr) Gol: your FINAL knowledge skills in biologicl nlysis! not reltively fir mrk for your work Microst Excel Stwre with Dt Anlysis Add-In instlled Dt http://edu.sblb.net//xls Lecture 1. Dt presenttion descriptive sttistics 2
COURSE OVERVIEW Recommended Literture Recommended Literture presenttion methodology Lecture 1. Dt presenttion descriptive sttistics 3
COURSE OVERVIEW Introduction Any biologicl study where numbers re mesured or reported Drug discovery BIOSTATISTICS: why where? Genomics systems biology Lecture 1. Dt presenttion descriptive sttistics Public helth 4
OUTLINE Lecture 1 Dt sttistic elements, vribles observtion types (qulittive quntittive) scles (nominl, ordinl, intervl, rtio) Descriptive sttistics: tbulr grphicl presenttion frequency distribution pie, br chrt histogrm representtion cumultive distributions crosstbultion sctter digrm Descriptive sttistics: numericl mesures mesures loction: men, mode, medin, quntiles/qurtiles/percentiles mesure vribility: vrince, strd devition, MAD, coefficient vrition or mesures: skewness distribution z-score. Chebyshev's orem. Detection outliers. Explortory nlysis. 5 number summry box plot Mesure ssocition between two vribles covrince correltion coefficient interprettion correltion coefficient Lecture 1. Dt presenttion descriptive sttistics 5
DATA AND STATISTICS Elements, vribles, observtions, scles types Lecture 1. Dt presenttion descriptive sttistics 6
DATA AND STATISTICS Dt: Elements, Vribles, Observtions Dt Dt The The fcts fcts figures figures collected, nlyzed, summrized for for presenttion interprettion. elements vribles observtion Person Plce Gender Net Worth ($BIL) Age Source Internet Fme Score Willim Gtes III 1 M 40 53 Microst 9.5 Wrren Buffett 2 M 37 79 Berkshire Hthwy 6.6 Crlos Slim Helu 3 M 35 69 telecom 2.1 Lwrence Ellison 4 M 22.5 64 Orcle 2.8 Ingvr Kmprd 5 M 22 83 IKEA 2.4 Krl Albrecht 6 M 21.5 89 Aldi 3.6 Mukesh Ambni 7 M 19.5 51 petrochemicls 4.4 Lkshmi Mittl 8 M 19.3 58 steel 5.4 Theo Albrecht 9 M 18.8 87 Aldi 1.5 Amncio Orteg 10 M 18.3 73 Zr 1.9 Jim Wlton 11 M 17.8 61 Wl-Mrt 3.9 Alice Wlton 12 F 17.6 59 Wl-Mrt 2.9 Cn we consider Plce s element? ( log 4.5) IFS = 3 10 N Lecture 1. Dt presenttion descriptive sttistics 7
DATA AND STATISTICS Dt Scles Types Dt Dt scles: Qulittive Nominl scle scle use use lbels lbels or or nmes nmes to to identify identify n n ttribute ttribute n n element. element. Ordinl scle scle exhibit exhibit properties properties nominl nominl order order or or rnk rnk is is meningful. meningful. Ex.1: Ex.2: Ex.1: Ex.2: Mle, Femle Rooms #: 101, 102, 103, Winners: The 1 st, 2 nd, 3 rd plces Mrks: A, B, C, Quntittive Intervl scle scle demonstrte demonstrte properties properties ordinl ordinl intervl intervl between between vlues vlues is is expressed expressed in in terms terms fixed fixed unit unit mesure mesure Rtio Rtio scle scle demonstrte demonstrte ll ll properties properties intervl intervl rtio rtio two two vlues vlues is is meningful. meningful. Ex.1: Exmintion score 0-100 Ex.2: Internet fme score Ex.1: Weight Ex.2: Price Lecture 1. Dt presenttion descriptive sttistics 8
DATA AND STATISTICS Tsk: Define Scles Person Plce Gender Net Worth ($BIL) Age Source Internet Fme Score Willim Gtes III 1 M 40 53 Microst 9.5 Wrren Buffett 2 M 37 79 Berkshire Hthwy 6.6 Crlos Slim Helu 3 M 35 69 telecom 2.1 Lwrence Ellison 4 M 22.5 64 Orcle 2.8 Ingvr Kmprd 5 M 22 83 IKEA 2.4 Krl Albrecht 6 M 21.5 89 Aldi 3.6 Mukesh Ambni 7 M 19.5 51 petrochemicls 4.4 Lkshmi Mittl 8 M 19.3 58 steel 5.4 Theo Albrecht 9 M 18.8 87 Aldi 1.5 Amncio Orteg 10 M 18.3 73 Zr 1.9 Jim Wlton 11 M 17.8 61 Wl-Mrt 3.9 Alice Wlton 12 F 17.6 59 Wl-Mrt 2.9 ( log 4.5) IFS = 3 10 N? Lecture 1. Dt presenttion descriptive sttistics 9
TABULAR AND GRAPHICAL PRESENTATION Frequency distribution, br pie chrts, histogrm, cumultive frequency distribution, sctter plot Lecture 1. Dt presenttion descriptive sttistics 10
TABULAR AND GRAPHICAL PRESENTATION Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Mrks A B C B A B B A B C Frequency distribution: Mrk Frequency A 3 B 5 C 2 Totl 10 Reltive frequency distribution: Mrk Frequency A 0.3 B 0.5 C 0.2 Totl 1 In MS Excel use following functions: Percent frequency distribution: =COUNTIF(,element) to get number elements found in re =SUM() to get sum vlues in re Mrk Frequency A 30% B 50% C 20% Totl 100% Lecture 1. Dt presenttion descriptive sttistics 11
TABULAR AND GRAPHICAL PRESENTATION Exmple: Pncretitis Study The role smoking in etiology pncretitis hs been recognized for mny yers. To provide estimtes quntittive significnce se fctors, hospitl-bsed study ws crried out in estern Msschusetts Rhode Isl between 1975 1979. 53 ptients who hd hospitl dischrge dignosis pncretitis were included in this unmtched cse-control study. The control group consisted 217 ptients dmitted for diseses or thn those pncres biliry trct. Risk fctor informtion ws obtined from strdized interview with ech subject, conducted by trined interviewer. dpted from Chp T. Le, Introductory Biosttistics pncretitis.xls Pncretitis ptients: Smokers Ex-smokers Ex-smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Never Smokers Ex-smokers Ex-smokers Smokers Ex-smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Ex-smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Smokers Never Smokers Smokers Smokers Lecture 1. Dt presenttion descriptive sttistics 12
FREQUENCY DISTRIBUTION Reltive Frequency Distribution Frequency Frequency distribution distribution A tbulr tbulr summry summry showing showing number number (frequency) (frequency) items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Reltive Reltive frequency frequency distribution distribution A tbulr tbulr summry summry showing showing frction frction or or proportion proportion items items in in ech ech severl severl nonoverlpping nonoverlppingclsses. Sum Sum ll ll vlues vlues should should give give 1 Estimtion Estimtion probbility probbility distribution distribution When When number number experiments experiments n,, R.F.D. R.F.D. P.D. P.D. pncretitis.txt Frequency distribution: Smoking Cses Controls Never 2 56 Ex-smokers 13 80 Smokers 38 81 Totl 53 217 Reltive frequency distribution: Smoking Cses Controls Never 0.038 0.258 Ex-smokers 0.245 0.369 Smokers 0.717 0.373 Totl 1 1 In Excel use following functions: =COUNTIF(,element) to get number elements found in re =SUM() to get sum vlues in re Lecture 1. Dt presenttion descriptive sttistics 13
TABULAR AND GRAPHICAL PRESENTATION Br Pie Chrts pncretitis.xls 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Never Ex-smokers Smokers Never Ex-smokers Smokers Pncretitis Pncretitis Control Control Pncretitis Pncretitis Never Never Ex-smokers Ex-smokers Smokers Smokers Control Control Try to void using in scientific reports. For public/business presenttions only! Never Never Ex-smokers Ex-smokers Smokers Smokers In MS Excel use following steps: Chrt Wizrd Columns Set rnge (both columns Percent freq. distribution) Chrt Wizrd Pie Set rnge (one columns Percent freq. distribution) Lecture 1. Dt presenttion descriptive sttistics 14
TABULAR AND GRAPHICAL PRESENTATION Crosstbultion pncretitis.xls Disese Smoking or pncretitis Totl Ex-smokers 80 13 93 Never 56 2 58 Smokers 81 38 119 Totl 217 53 270 In Excel use following steps: Dt Pivot Tble PivotChrt MS Office list + Pivot Tble Set rnge, including heders Select output set lyout by drg--dropping nmes into tble Lecture 1. Dt presenttion descriptive sttistics 15
TABULAR AND GRAPHICAL PRESENTATION Exmple: Mice Dt Series Tordf Tordf MG, MG, Bchmnov Bchmnov AA AA Survey clcium & sodium intke Survey clcium & sodium intke metbolism metbolism with with bone bone body body composition composition Project Project symbol: symbol: Tordf3 Tordf3 Accession Accession number: number: MPD:103 MPD:103 mice.xls 790 mice from different strins http://phenome.jx.org prmeter Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight Lecture 1. Dt presenttion descriptive sttistics 16
The following re weights in grms for 970 mice: TABULAR AND GRAPHICAL PRESENTATION Histogrm mice.xls 20.5 23.2 24.6 23.5 26 25.9 23.9 22.8 19.9 20.8 22.4 26 23.8 26.5 26 22.8 22.9 20.9 19.8 22.7 31 22.7 26.3 27.1 18.4 21 18.8 21 21.4 25.7 19.7 27 26.2 21.8 22.2 19.2 21.9 22.6 23.7 26.2 26 27.5 25 20.9 20.6 22.1 20 21.1 24.1 28.8 30.2 20.1 24.2 25.8 21.3 21.8 23.7 23.5 28 27.6 21.6 21 21.3 20.1 20.8 24.5 23.8 29.5 21.4 21.5 24 21.1 18.9 19.5 32.3 28 27.1 28.2 22.9 19.9 20.4 21.3 20.6 22.8 25.8 24.1 23.5 24.2 22 20.3 Sorted weights show tht vlues re in 10 49.6 grms. Let us divide weight into bins bins bins Weight,g Frequency >=10 1 10-20 237 20-30 417 30-40 124 40-50 11 More 0 Lecture 1. Dt presenttion descriptive sttistics 17
TABULAR AND GRAPHICAL PRESENTATION Histogrm Now, let us use bin-size = 1 grm Bin Frequency 10 1 11 13 12 12 13 25 14 29 46 1 47 1 48 0 49 1 50 1 More 0 In Excel use following steps: Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Specify column bins (intervl) upper-limits Tools Dt Anlysis Histrogrm select input, bins, output (Anlysis ToolPk should be instlled) use Chrt Wizrd Columns to visulize results Lecture 1. Dt presenttion descriptive sttistics 18
TABULAR AND GRAPHICAL PRESENTATION Cumultive Frequency Distribution Cumultive Cumultive frequency frequency distribution distribution A tbulr tbulr summry summry quntittive quntittive showing showing number number items items with with vlues vlues less less thn thn or or equl equl to to upper upper clss clss limit limit ech ech clss. clss. Frequency 60 50 40 30 20 10 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Weight, g Cumultive reltive frequency Ogive 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 20 30 40 50 Weight, g Lecture 1. Dt presenttion descriptive sttistics 19
TABULAR AND GRAPHICAL PRESENTATION Sctter Plot mice.xls Let us look on mutul dependency Strting Ending weights. 50 45 40 35 Ending weight 30 25 20 15 In Excel use following steps: Select region Use Chrt Wizrd XY (Sctter) 10 5 0 0 5 10 15 20 25 30 35 40 45 50 Strting weight Lecture 1. Dt presenttion descriptive sttistics 20
NUMERICAL MEASURES Popultion smple, mesures loction, quntiles, qurtiles percentiles, mesures vribility, z-score, detection outliers, explortion nlysis, box plot, covrition, correltion Lecture 1. Dt presenttion descriptive sttistics 21
NUMERICAL MEASURES Popultion Smple Popultion Popultion prmeter prmeter A numericl numericl vlue vlue used used s s summry summry mesure mesure for for popultion popultion (e.g., (e.g., popultion popultion men men µ, µ, vrince vrince σ 2 σ, 2, strd strd devition devition σ) σ) POPULATION µ men σ 2 vrince N number elements (usully N= ) SAMPLE x m, men s 2 vrince n number elements Smple Smple sttistic sttistic A numericl numericl vlue vlue used used s s summry summry mesure mesure for for smple smple (e.g., (e.g., smple smple men men m, m, smple smple vrince vrince s s 2, 2, smple smple strd strd devition devition s) s) mice.xls 790 mice from different strins http://phenome.jx.org All existing lbortory Mus musculus ID Strin Sex Strting ge Ending ge Strting weight Ending weight Weight chnge Bleeding time Ionized C in blood Blood ph Bone minerl density Len tissues weight Ft weight 1 129S1/SvImJ f 66 116 19.3 20.5 1.062 64 1.2 7.24 0.0605 14.5 4.4 2 129S1/SvImJ f 66 116 19.1 20.8 1.089 78 1.15 7.27 0.0553 13.9 4.4 3 129S1/SvImJ f 66 108 17.9 19.8 1.106 90 1.16 7.26 0.0546 13.8 2.9 368 129S1/SvImJ f 72 114 18.3 21 1.148 65 1.26 7.22 0.0599 15.4 4.2 369 129S1/SvImJ f 72 115 20.2 21.9 1.084 55 1.23 7.3 0.0623 15.6 4.3 370 129S1/SvImJ f 72 116 18.8 22.1 1.176 1.21 7.28 0.0626 16.4 4.3 371 129S1/SvImJ f 72 119 19.4 21.3 1.098 49 1.24 7.24 0.0632 16.6 5.4 372 129S1/SvImJ f 72 122 18.3 20.1 1.098 73 1.17 7.19 0.0592 16 4.1 4 129S1/SvImJ f 66 109 17.2 18.9 1.099 41 1.25 7.29 0.0513 14 3.2 5 129S1/SvImJ f 66 112 19.7 21.3 1.081 129 1.14 7.22 0.0501 16.3 5.2 10 129S1/SvImJ m 66 112 24.3 24.7 1.016 119 1.13 7.24 0.0533 17.6 6.8 364 129S1/SvImJ m 72 114 25.3 27.2 1.075 64 1.25 7.27 0.0596 19.3 5.8 365 129S1/SvImJ m 72 115 21.4 23.9 1.117 48 1.25 7.28 0.0563 17.4 5.7 366 129S1/SvImJ m 72 118 24.5 26.3 1.073 59 1.25 7.26 0.0609 17.8 7.1 367 129S1/SvImJ m 72 122 24 26 1.083 69 1.29 7.26 0.0584 19.2 4.6 6 129S1/SvImJ m 66 116 21.6 23.3 1.079 78 1.15 7.27 0.0497 17.2 5.7 7 129S1/SvImJ m 66 107 22.7 26.5 1.167 90 1.18 7.28 0.0493 18.7 7 8 129S1/SvImJ m 66 108 25.4 27.4 1.079 35 1.24 7.26 0.0538 18.9 7.1 9 129S1/SvImJ m 66 109 24.4 27.5 1.127 43 1.29 7.29 0.0539 19.5 7.1 Lecture 1. Dt presenttion descriptive sttistics 22
NUMERICAL MEASURES Mesures Loction Men Men A mesure mesure centrl centrl loction loction computed computed by by summing summing vlues vlues dividing dividing by by number number observtions. observtions. x = µ = p = m = x i N n x ( x true) = i n i Medin Medin A mesure mesure centrl centrl loction loction provided provided by by vlue vlue in in middle middle when when re re rrnged rrnged in in scending scending order. order. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Mode Mode A mesure mesure loction, loction, defined defined s s vlue vlue tht tht occurs occurs with with gretest gretest frequency. frequency. Mode = 23 Medin = 23.5 Men = 31.7 Lecture 1. Dt presenttion descriptive sttistics 23
NUMERICAL MEASURES Mesures Loction mice.xls Histogrm p.d.f. pproximtion medinmen mode Femle proportion p f = 0.501 Density 0.00 0.02 0.04 0.06 10 15 20 25 30 35 40 weight, g Bleeding time In Excel use following functions: = AVERAGE() = MEDIAN() = MODE() Density 0.000 0.010 0.020 medin = 55 men = 61 mode = 48 0 50 100 150 200 N = 760 Bwidth = 5.347 Lecture 1. Dt presenttion descriptive sttistics 24
NUMERICAL MEASURES Quntiles, Qurtiles Percentiles Percentile Percentile A vlue vlue such such tht tht t t lest lest p% p% observtions observtions re re less less thn thn or or equl equl to to this this vlue, vlue, t t lest lest (100-p)% (100-p)% observtions observtions re re greter greter thn thn or or equl equl to to this this vlue. vlue. The The 50-th 50-th percentile percentile is is medin. medin. Qurtiles Qurtiles The The 25th, 25th, 50th, 50th, 75th 75th percentiles, percentiles, referred referred to to s s first first qurtile, qurtile, second second qurtile qurtile (medin), (medin), third third qurtile, qurtile, respectively. respectively. In Excel use following functions: =PERCENTILE(,p) Weight 12 16 19 22 23 23 24 32 36 42 63 68 Q 1 = 21 Q 2 = 23.5 Q 3 = 39 Lecture 1. Dt presenttion descriptive sttistics 25
NUMERICAL MEASURES Mesures Vribility Interqurtile Interqurtile rnge rnge (IQR) (IQR) A mesure mesure vribility, vribility, defined defined to to be be difference difference between between third third first first qurtiles. qurtiles. Vrince Vrince A mesure mesure vribility vribility bsed bsed on on squred squred devitions devitions vlues vlues bout bout men. men. Strd Strd devition devition A mesure mesure vribility vribility computed computed by by tking tking positive positive squre squre root root vrince. vrince. IQR = Q 3 Q 1 popultion smple s σ 2 x i = N ( ) 2 µ 2 ( x m) i = n 1 2 Smple strd devition = s = Popultion strd devition = σ = 2 s 2 σ Weight 12 16 19 22 23 23 24 32 36 42 63 68 IQR = 18 Vrince = 320.2 St. dev. = 17.9 In Excel use following functions: =VAR(), =STDEV() Lecture 1. Dt presenttion descriptive sttistics 26
NUMERICAL MEASURES Mesures Vribility Coefficient Coefficient vrition vrition A mesure mesure reltive reltive vribility vribility computed computed by by dividing dividing strd strd Strd devition devition devition by by men. 100 % men. Weight 12 16 19 22 23 23 24 32 36 42 63 68 Men CV = 57% Medin Medin bsolute bsolute devition devition (MAD) (MAD) MAD MAD is is robust robust mesure mesure vribility vribility univrite univrite smple smple quntittive quntittive.. MAD = medin ( x medin( x) ) i Set 1 Set 2 23 23 12 12 22 22 12 12 21 21 18 81 22 22 20 20 12 12 19 19 14 14 13 13 17 17 Set 1 Set 2 Men 17.3 22.2 Medin 18 19 St.dev. 4.23 18.18 MAD 5.93 5.93 Lecture 1. Dt presenttion descriptive sttistics 27
NUMERICAL MEASURES Mesures Vribility Skewness Skewness A mesure mesure shpe shpe distribution. distribution. Dt Dt skewed skewed to to left left result result in in negtive negtive skewness; skewness; symmetric symmetric distribution distribution results results in in zero zero skewness; skewness; skewed skewed to to right right result result in in positive positive skewness. skewness. Skewness = n x m ( )( ) i n 1 n 2 s i 3 dpted from Anderson et l Sttistics for Business Economics Lecture 1. Dt presenttion descriptive sttistics 28
NUMERICAL MEASURES z-score Detection Outliers z-score z-score A vlue vlue computed computed by by dividing dividing devition devition bout bout men men (x (x i - i - x) x) by by strd strd devition devition s. s. A z-score z-score is is referred referred to to s s strdized strdized vlue vlue denotes denotes number number strd strd devitions devitions x x i is i is from from men. men. Chebyshev s orem For For ny ny set, set, t t lest lest (1 (1 1/z 1/z 2 2 )) vlues vlues must must be be within within z strd devitions from from men, men, where where z ny ny vlue vlue > 1. 1. z i = x i m s Weight z-score 12-1.10 16-0.88 19-0.71 22-0.54 23-0.48 23-0.48 24-0.43 32 0.02 36 0.24 42 0.58 63 1.75 68 2.03 For ANY distribution: At lest 75 % vlues re within z = 2 strd devitions from men At lest 89 % vlues re within z = 3 strd devitions from men At lest 94 % vlues re within z = 4 strd devitions from men At lest 96% vlues re within z = 5 strd devitions from men Lecture 1. Dt presenttion descriptive sttistics 29
NUMERICAL MEASURES Detection Outliers For bell-shped distributions: Approximtely 68 % vlues re within 1 st.dev. from men Approximtely 95 % vlues re within 2 st.dev. from men Almost ll points re inside 3 st.dev. from men Outlier Outlier An An unusully unusully smll smll or or unusully unusully lrge lrge vlue. vlue. For For bell-shped bell-shped distributions distributions points points with with z >3 z >3 cn cn be be considered considered s s outliers. outliers. Exmple: Gussin distribution Weight z-score 23 0.04 12-0.53 22-0.01 12-0.53 21-0.06 81 3.10 22-0.01 20-0.11 12-0.53 19-0.17 14-0.43 13-0.48 17-0.27 Lecture 1. Dt presenttion descriptive sttistics 30
NUMERICAL MEASURES Tsk: Detection Outliers mice.xls Using Excel, try to identify outlier mice on bsis Weight chnge vrible z i = x i s m For For bell-shped bell-shped distributions distributions points points with with z >3 z >3 cn cn be be considered considered s s outliers. outliers. In Excel use following functions: = AVERAGE() - men, m = STDEV() - strd devition, s = bs() - bsolute vlue sort by z-scle to identify outliers Lecture 1. Dt presenttion descriptive sttistics 31
NUMERICAL MEASURES Explortion Dt Anlysis Five-number Five-number summry summry An An explortory explortory nlysis nlysis technique technique tht tht uses uses five five numbers numbers to to summrize summrize : : smllest smllest vlue, vlue, first first qurtile, qurtile, medin, medin, third third qurtile, qurtile, lrgest lrgest vlue vlue children.xls Min. Min. : : 12 12 Q 1 : 1 : 25 25 Medin: Medin: 32 32 Q 3 : 3 : 46 46 Mx. Mx. : : 79 79 In Excel use: Tool Dt Anlysis Descriptive Sttistics Box Box plot plot A grphicl grphicl summry summry bsed bsed on on five-number five-number summry summry Min Q 2 Q 1 Box Qplot 3 Mx In Excel use (indirect): Chrt Wizrd Stock Open-high-low-close open Q3 high Q3+1.5*IQR low Q1-1.5*IQR close Q1 1.5 IQR Lecture 1. Dt presenttion descriptive sttistics 32
NUMERICAL MEASURES Exmple: Mice Weight Exmple Exmple Build Build box box plot plot for for weights weights mle mle femle femle mice mice mice.xls 1. Build 5 number summries for mles femles Femle Mle Min 10.0 12.0 Q1 17.2 23.8 Q2 20.7 27.1 Q3 23.3 31.2 Mx 41.5 49.6 2. Combine numbers into following order open Q3 high Q3+min(1.5*(Q3-Q1),Mx) low Q1-mx(1.5*(Q3-Q1),Min) close Q1 Mouse weight In Excel use: Chrt Wizrd Stock Open-high-low-close Put series-in-rows Adjust colors, etc Weight, g 45 40 35 30 25 20 15 10 5 0 Femle Mle Lecture 1. Dt presenttion descriptive sttistics 33
NUMERICAL MEASURES Mesure Assocition between 2 Vribles Covrince Covrince A mesure mesure liner liner ssocition ssocition between between two two vribles. vribles. Positive Positive vlues vlues indicte indicte positive positive reltionship; reltionship; negtive negtive vlues vlues indicte indicte negtive negtive reltionship. reltionship. σ xy = popultion ( xi µ x )( yi µ y ) N s xy smple ( x x)( y y) = n 1 i i mice.xls Ending weight vs. Strting weight Ending weight 60 50 40 30 20 10 0 0 10 20 30 40 50 Strting weight In Excel use function: =COVAR() s xy = 39.8 hrd to interpret Lecture 1. Dt presenttion descriptive sttistics 34
NUMERICAL MEASURES Mesure Assocition between 2 Vribles Correltion Correltion (Person (Person product product moment moment correltion correltion coefficient) coefficient) A mesure mesure liner liner ssocition ssocition between between two two vribles vribles tht tht tkes tkeson on vlues vlues between between -1-1 +1. +1. Vlues Vlues ner ner +1 +1 indicte indicte strong strong positive positive liner liner reltionship, reltionship, vlues vlues ner ner -1-1 indicte indicte strong strong negtive negtive liner liner reltionship; reltionship; vlues vlues ner ner zero zero indicte indicte lck lck liner liner reltionship. reltionship. popultion ( x x)( y y) σ xy i i ρxy = = σ σ σ σ N x y x y r xy smple sxy = = s s x y ( xi x)( yi y) s s ( n 1) x y 60 Ending weight 50 40 30 20 10 In Excel use function: =CORREL() r xy = 0.94 0 0 10 20 30 40 50 Strting weight mice.xls Lecture 1. Dt presenttion descriptive sttistics 35
NUMERICAL MEASURES Correltion Coefficient If If we we hve hve only only 2 points points in in x x y y sets, sets, wht wht vlues vlues would would you you expect expect for for correltion correltion b/w b/w xx y y? Wikipedi Lecture 1. Dt presenttion descriptive sttistics 36
NUMERICAL MEASURES Weighted Men Weighted Weighted men men The The men men obtined obtined by by ssigning ssigning ech ech observtion observtion weight weight tht tht reflects reflects its its importnce importnce m = w x i w i i As n exmple need weighted men, consider following smple five purchses rw mteril over severl months Note tht cost per pound vries from $2.80 to $3.40, quntity purchsed hs vried from 500 to 2750. Suppose tht mnger sked for informtion bout men cost per pound rw mteril. If we would use simple men cost p.p.: we overestimte verge cost! Anderson et l Sttistics for Business Economics Lecture 1. Dt presenttion descriptive sttistics 37
NUMERICAL MEASURES Grouped Men Grouped Grouped Dt Dt vilble vilble in in clss clss intervls intervls s s summrized summrized by by frequency frequency distribution. distribution. Individul Individul vlues vlues originl originl re re not not vilble. vilble. not vilble children.xls Bin Frequency 20 5 30 21 40 8 50 14 60 3 70 4 80 2 More 0 Men for grouped m = k i f i n M i Vrince for grouped s 2 = k i f i ( M m) i n 1 2 Lecture 1. Dt presenttion descriptive sttistics 38
QUESTIONS? Thnk you for your ttention to be continued Lecture 1. Dt presenttion descriptive sttistics 39