Chapter 5. Describing numerical data

Similar documents
7. Bivariate Graphing

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

2.4.1 STA-O Assessment 2

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Daniel Boduszek University of Huddersfield

CHAPTER 3 Describing Relationships

Understandable Statistics

Probability and Statistics. Chapter 1

ANOVA in SPSS (Practical)

Using SPSS for Correlation

Chapter 1: Exploring Data

Lab 7 (100 pts.): One-Way ANOVA Objectives: Analyze data via the One-Way ANOVA

Unit 1 Exploring and Understanding Data

Stat Wk 8: Continuous Data

Introduction to Statistical Data Analysis I

One-Way Independent ANOVA

Examining differences between two sets of scores

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

MA 151: Using Minitab to Visualize and Explore Data The Low Fat vs. Low Carb Debate

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Undertaking statistical analysis of

STT315 Chapter 2: Methods for Describing Sets of Data - Part 2

Q: How do I get the protein concentration in mg/ml from the standard curve if the X-axis is in units of µg.

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

AP Statistics. Semester One Review Part 1 Chapters 1-5

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Unit 7 Comparisons and Relationships

Intro to SPSS. Using SPSS through WebFAS

Lesson 9 Presentation and Display of Quantitative Data

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Explore. sexcntry Sex according to country. [DataSet1] D:\NORA\NORA Main File.sav

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

How to interpret scientific & statistical graphs

Before we get started:

STAT Factor Analysis in SAS

SPSS Correlation/Regression

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Empirical Rule ( rule) applies ONLY to Normal Distribution (modeled by so called bell curve)

V. Gathering and Exploring Data

NORTH SOUTH UNIVERSITY TUTORIAL 1

4.3 Measures of Variation

Section 6: Analysing Relationships Between Variables

Lab #7: Confidence Intervals-Hypothesis Testing (2)-T Test

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Chapter 1. Picturing Distributions with Graphs

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Business Statistics Probability

Statistical Methods Exam I Review

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

Day 11: Measures of Association and ANOVA

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

bivariate analysis: The statistical analysis of the relationship between two variables.

LOTS of NEW stuff right away 2. The book has calculator commands 3. About 90% of technology by week 5

isc ove ring i Statistics sing SPSS

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

CHAPTER ONE CORRELATION

1. To review research methods and the principles of experimental design that are typically used in an experiment.

Test 1C AP Statistics Name:

Two-Way Independent ANOVA

Section I: Multiple Choice Select the best answer for each question.

Chapter 1: Explaining Behavior

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Psychology of Perception Psychology 4165, Fall 2001 Laboratory 1 Weight Discrimination

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

Psychology of Perception Psychology 4165, Spring 2003 Laboratory 1 Weight Discrimination

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

QPM Lab 9: Contingency Tables and Bivariate Displays in R

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

SCATTER PLOTS AND TREND LINES

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

1) What is the independent variable? What is our Dependent Variable?

The Effectiveness of Captopril

STP226 Brief Class Notes Instructor: Ela Jackiewicz

Still important ideas

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Lesson 1: Distributions and Their Shapes

Descriptive statistics

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

How to Conduct On-Farm Trials. Dr. Jim Walworth Dept. of Soil, Water & Environmental Sci. University of Arizona

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

INTERPRET SCATTERPLOTS

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Chapter 3: Describing Relationships

NEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003

Impact of Group Rejections on Self Esteem

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

MULTIPLE OLS REGRESSION RESEARCH QUESTION ONE:

Knowledge discovery tools 381

Transcription:

Chapter 5 Describing numerical data 1

Topics Numerical descriptive measures Location Variability Other measurements Graphical methods Histogram Boxplot, Stem and leaf plot Scatter plot for bivariate data 2

Exploring numerical data The number of observations in the data set The center of the data Mean Median The variability of the data Variance or standard deviation Range Interquartile range Coefficient of Variation 3

Other measures Minimum and Maximum Exploring numerical data First quartile (25th percentile) and third quartile (75th percentile) Percentiles Skewness Kurtosis 4

Example 5.1 The percentage returns achieved by 76 average risk funds 21.88 59.82 33.28 20.23 15.78 13.59 32.98 11.31 21.46 14.45 25.43 10.67 12.11 17.69 56.87 4.43 24.95 20.13 8.75 12.45 19.89 37.71 49.97 19.67 55.4 27.16 39.44 19.45 15.42 55.9 28.25 31.03 31.15 41.86 49.76 24.89 11.37 18.19 32.5 7.57 29.18 31.7 67.69 10.09 61.88 24.08 7.97 24.22 18.18 24.21 31.96 45.39 56.63 14.36 10 45.37 27.16-2.7 11.38 12.67 12.41 91.15 8.4 30.52 20.26 4.08 26.09 12.21 67.45 43.31 19.21 18.49 21.31 25.18 26.85 18.89 5

Some preliminary observations What can we say about the returns of these funds? Almost all the funds have a positive return. The spread of the returns ranges from -2.7% to 91.15%. There appears to be one very high return, 91.15%. 6

Descriptive statistics: SAS SAS:Proc Univariate Program data ex5 1ar; infile F:\ST2137\lecdata\ex5 1ar.txt firstobs=2; input return; proc univariate data=ex5 1ar; title Simple Descriptive Statistics ; var return; run; 7

Output from Proc Univariate 8

Output from Proc Univariate 9

Output from Proc Univariate 10

Output from Proc Univariate 11

Output from Proc Univariate 12

SAS: Proc Mean proc means data=ex5 1ar n mean std min max stderr maxdec=4; title Procedure Means ; var return; run; 13

Output from Proc Mean 14

Descriptive statistics: R > ex5.1=read.table ( F:/ST2137/lecdata/ex5 1ar.txt,header=T) > ex5.1ar=ex5.1[,1] > summary(ex5.1ar) Min 1st Qu Median Mean 3rd Qu. Max -2.70 14.17 22.98 27.00 32.62 91.15 > mean(ex5.1ar) [1] 27.00079 > median(ex5.1ar) [1]22.98 > min(ex5.1ar) [1] -2.7 > max(ex5.1ar) [1] 91.15 15

Descriptive statistics: R > range(ex5.1ar) [1] -2.70 91.15 > quantile(ex5.1ar) 0% 25% 50% 75% 100% -2.7000 14.1675 22.9800 32.6200 91.1500 > var(ex5.1ar) [1] 312.0435 > sd(ex5.1ar) [1] 17.66475 > IQR(ex5.1ar) #interquartile range [1] 18.4525 16

Descriptive statistics: R > ex5.1ar[order(ex5.1ar)[1:5]] #the smallest 5 observations [1] -2.70 4.08 4.43 7.57 7.97 > nar=length(ex5.1ar) > ex5.1ar[order(ex5.1ar)[(nar-5):nar]] #the biggest 5 observations [1] 56.87 59.82 61.88 67.45 67.69 91.15 > cv=function(x)sd(x)/mean(x)*100 # Compute coefficient of variation (ex5.1ar) > cv(ex5.1ar) return 65.42 17

Calculating sample skewness: R Unbiased estimator of skewness is given by n(n 1) n 2 > skew=function(x){ n=length(x) m3=sum((x-mean(x)) 3)/n m2=sum((x-mean(x)) 2)/n m3/m2 (3/2) sqrt(n (n-1))/(n-2)} > skew(ex5.1ar) [1] 1.216081 ( m 3 ) m 3/2 2 18

Calculating sample kurtosis: R Unbiased estimator of kurtosis is given by (n 1) (n 2)(n 3) ((n + 1)m 4 m 2 2 3(n 1)) > kurt=function(x){ n=length(x) m4=sum((x-mean(x)) 4)/n m2=sum((x-mean(x)) 2)/n (n-1)/((n-2)*(n-3))*((n+1)*m4/m2 2-3*(n-1))} > kurt(ex5.1ar) [1] 1.545188 19

Descriptive Statistics: SPSS Import ex5.1ar.txt into the SPSS and call it ex5.1ar.sav Analyze Descriptive Statistics Descriptive... Move the variable return from the left panel to the right panel Click Option Choose the statistics that you want to compute Click Continue OK 20

Descriptive Statistics: SPSS 21

Output of descriptive statistics: SPSS Descriptive Statistics N Range Minimum Maximum Return Statistic Statistic Statistic Statistic 76 93.85-2.70 91.15 Mean Std Deviation Variance Statistic Std.Error Statistic Statistic 27.0008 2.02629 17.666475 312.043 Skewness Kurtosis Statistic Std.Error Statistic Std.Error 1.216.276 1.545.545 22

Graphical Methods: SAS Box Plot Stem and leaf plot Histogram QQ Plot 23

Plot in Proc Univariate: SAS Program /*Boxplot, stem and leaf plot and qq plot */ proc univariate data=ex5 1ar plot; var return; run; 24

Plot in Proc Univariate: SAS output 25

Plot in Proc Univariate: SAS output 26

Program /*Histogram and qqplot */ options reset=all ftext= Arial/bo cback=white colors=(black) gunit=pct htext=2 hpos=15; Histogram: SAS 27

Histogram: SAS Program proc univariate data=ex5 1ar; var return; histogram return/normal; /*Option normal creates a normal curve */ inset mean= Mean std= Standard Deviation /font= Arial pos=ne height=4; qqplot return; run; quit; 28

Output: Histogram in SAS 29

Output: Histogram in SAS 30

Histogram: R > hist(return,include.lowest=true,freq=true, main=paste( Histogram of return ), xlab= return, ylab= frequency, axes=true) > # Normal curve imposed on the histogram >xpt=seq(-10,100,0.1) >ypt=dnorm(seq(-10,100,0.1),mean(return),sd(return)) >ypt=ypt*length(return)*10 >lines(xpt,ypt) 31

32

QQplot: R > # qqplot of normal distribution > qqnorm(return) >qqline(return) 33

Stem and leaf plot: R > stem(return) 34

Boxplot: R >boxplot(return) 35

Plot: SPSS Open the SPSS data file ex5.1ar.sav Analyze Descriptive Statistics Explore.. Click plots Choose the plots you want Click Continue OK 36

Plot: SPSS 37

Output: Histogram in SPSS 38

Output: QQ plot in SPSS 39

Output: Boxplot in SPSS 40

Alternative way to get histogram in SPSS Open the SPSS data file ex5.1ar.sav Graphs Histogram... Move the variable return from the left panel to the right panel under Variable Choose Display normal curve, then click OK 41

Alternative way to get QQplot in SPSS Open the SPSS data file ex5.1ar.sav Analyze Descriptive Statistics QQ plot. Move the variable return from the left panel to the right panel under Variable Choose Normal in the Test distribution, then click OK 42

Alternative way to get Boxplot in SPSS Open the SPSS data file ex5.1ar.sav Graphs Legacy Dialogs Box plot... Choose Simple in the Summaries in separate variables, then click Define Move the variable return from the left panel to the right panel under Variable Click OK 43

Example 5.2 In addition to the percentage returns of the 76 average-risk funds, we have the following percentage returns achieved by 47 high-risk funds: 24.47 8.76 58.71 35.07-22.82 15.67 37.47 13.4 49.02-3.16 43.97 13.91 23.84-2.89 39.64 29.72 17.91-10.55 47.38 68.58 86.13-12.57-0.33-5.32 4 0.14 45.97 23.87 38.23 63.79 26.6 6.62 6.72 36.02 29.51 22.56 1.62 29.33 28.91 29.32 42.91 8.87 54.43 28.31 30.76 49.76 1.98 44

Preparing the dataset For a given fund, there are two variables: Return: Percentage return of the fund and Risk: 1 for average-risk fund and 2 for high-risk fund. Notice that the variable Risk is a classification (categorical) variable. 45

How to enter the data Two columns with the first column representing the return percentage while the second column identifies the type of funds There are 123 data in both columns. 46

Descriptive Statistics by groups: SAS proc format; value $risk 1 = Average Risk 2 = High Risk ; data ex5 2; infile F:\ST2137\lecdata\ex5 2.txt ; input return risk$; label return= Return Percentage ; format risk $risk.; 47

Descriptive Statistics by groups: SAS proc univariate data=ex5 2 plot; histogram return/midpoints=-30 to 90 by 15 normal; class risk; inset mean= Mean std= Standard Deviation /font= Arial pos=nw height=3; qqplot return; run; quit; 48

Output by groups: SAS 49

Output by groups: SAS 50

proc boxplot data=ex5 2; plot return*risk; run; Boxplot by groups: SAS m 51

Descriptive statistics by groups: R >ex5.2=read.table( G:/ST2137/lecdata/ex5 2.txt,header=T) >attach(ex5.2) >ex5.2ar=ex5.2[risk==1, Return ] >ex5.2hr=ex5.2[risk==2, Return ] >summary(ex5.2hr) Min 1st Qu Median Mean 3rd Qu. Max -22.82 6.67 26.60 24.81 38.94 86.13 52

>stem(ex5.2hr) Stem and leaf plot by groups: R 53

>boxplot(return risk) Boxplot by groups: R 20 0 20 40 60 80 1 2 54

Descriptive statistics by groups: SPSS Import ex5.2.txt into the SPSS and call it ex5.2.sav Analyze Descriptive Statistics Explore... Move Return from the left panel to the Dependent List panel Move Risk from left panel to the Factor List panel 55

Descriptive statistics by groups: SPSS Click Plots Choose the plots that you want Click Continue OK 56

Boxplot by groups:spss 57

Plot of bivariate data: SAS Refer to the data set htwt1 in Tutorial 1 data htwt1; infile F:\ST2137\tutorialdata\htwt1.txt ; input id gender$ height weight siblings; proc gplot data=htwt1; title Scatter Plot of WEIGHT by HEIGHT ; plot weight* height; symbol value=dot color=black; run; 58

Scatter plot of bivariate data: SAS 59

Plot of bivariate data: SAS proc gplot data=htwt1; title Using gender to generate the plotting symbols ; plot weight height=gender; symbol1 value=circle color=red; symbol2 value=square color=black; run; 60

Scatter plot of bivariate data for groups: SAS 61

Plot of bivariate data: SAS proc sort data=htwt1; by gender; proc gplot data=htwt1; title Separate Plots by GENDER ; by gender; plot weight*height; run; 62

Scatter plot of bivariate data for groups: SAS 63

Scatter plot of bivariate data for groups: SAS 64

Plot of bivariate data: R > htwt1=read.table( F:/ST2137/tutorialdata/htwt1.txt,header=TRUE) > names(htwt1)=c( id, gender, height, weight, siblings ) > attach(htwt1) > plot(height,weight,main= Plot of Weight by Height ) 65

Scatter plot of bivariate data for groups: R Plot of Weight by Height weight 40 50 60 70 80 150 160 170 180 height 66

Plot of bivariate data: R >plot(height[gender== M ],weight[gender== M ], +main= Use Gender to geneerate the plotting symbol, +ylab= Weight,xlab= Height, +xlim=c(150,190), ylim=c(40,80)) >par(new=t) >plot(height[gender== F ],weight[gender== F ], +main=,ylab=,xlab=,xlim=c(150,190), ylim=c(40,80), +axes=f,pch=0,col=2) 67

Scatter plot of bivariate data for groups: R Use Gender to geneerate the plotting symbol Weight 40 50 60 70 80 150 160 170 180 190 Height 68

Plot of bivariate data: R >par(mfrow=c(2,1)) >plot(height[gender== M ],weight[gender== M ], +main= Plot of Weight by Height for Male, +ylab= Weight,xlab= Height, +xlim=c(160,190), ylim=c(40,80)) >plot(height[gender== F ],weight[gender== F ], +main= Plot of Weight by Height for Female, +ylab= Weight,xlab= Height, +xlim=c(140,180), ylim=c(40,80)) 69

Output of separate plots by gender : R Plot of Weight by Height for Male Weight 40 50 60 70 80 160 165 170 175 180 185 190 Height Plot of Weight by Height for Female Weight 40 50 60 70 80 140 150 160 170 180 Height 70

Import the data set htwt1 Plot of bivariate data: SPSS Graph Legacy Dialogs Scatter/Dot... Choose Simple Scatter Define Move the variable height to the horizontal axis in the 2-D coordinate and weight to the vertical axis in the 2-D coordinate. 71

Scatter plot of bivariate data: SPSS 72

Plot of bivariate data for groups: SPSS Graph Legacy Dialogs Scatter/Dot... Move the variable height to the horizontal axis in the 2-D coordinate and weight to the vertical axis in the 2-D coordinate. Move the variable gender to the label cases by window. 73

Plot of bivariate data for groups: SPSS 74

Plot of bivariate data for groups: SPSS Click Options, choose Display chart with case labels 75

Output of plot by groups: SPSS 76

Separate plots by gender : SPSS Graph Legacy Dialogs Scatter/Dot... Move the variable height to the horizontal axis in the 2-D coordinate and weight to the vertical axis in the 2-D coordinate. Move the variable gender to the Panel Columns window. 77

Output of plot by gender :SPSS 78