Introduction to Statistical Data Analysis I

Similar documents
Understandable Statistics

Unit 1 Exploring and Understanding Data

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Chapter 1: Exploring Data

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

2.4.1 STA-O Assessment 2

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

V. Gathering and Exploring Data

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

STATISTICS AND RESEARCH DESIGN

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Undertaking statistical analysis of

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Chapter 1: Introduction to Statistics

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Business Statistics Probability

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

HW 1 - Bus Stat. Student:

Still important ideas

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Knowledge discovery tools 381

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

Chapter 1. Picturing Distributions with Graphs

Probability and Statistics. Chapter 1

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Chapter 1: Explaining Behavior

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Frequency Distributions

Averages and Variation

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Section I: Multiple Choice Select the best answer for each question.

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

PRINCIPLES OF STATISTICS

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Lesson 9 Presentation and Display of Quantitative Data

Unit 7 Comparisons and Relationships

Frequency Distributions

How to interpret scientific & statistical graphs

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

STP226 Brief Class Notes Instructor: Ela Jackiewicz

AP Stats Review for Midterm

Frequency distributions

Statistical Methods Exam I Review

1) What is the independent variable? What is our Dependent Variable?

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Chapter 7: Descriptive Statistics

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Statistics is a broad mathematical discipline dealing with

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Methodological skills

9 research designs likely for PSYC 2100

Chapter 2--Norms and Basic Statistics for Testing

4.3 Measures of Variation

SCATTER PLOTS AND TREND LINES

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Instructions and Checklist

Lesson 8 Descriptive Statistics: Measures of Central Tendency and Dispersion

bivariate analysis: The statistical analysis of the relationship between two variables.

MTH 225: Introductory Statistics

STT 200 Test 1 Green Give your answer in the scantron provided. Each question is worth 2 points.

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

CHAPTER 3 Describing Relationships

Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3 - Numerical Descriptive Measures Solutions

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

The normal curve and standardisation. Percentiles, z-scores

(a) 50% of the shows have a rating greater than: impossible to tell

STATISTICS & PROBABILITY

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Statistics and Epidemiology Practice Questions

Appendix B Statistical Methods

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Test 1C AP Statistics Name:

Statistics: Making Sense of the Numbers

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

ANOVA in SPSS (Practical)

Analysis and Interpretation of Data Part 1

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

First Hourly Quiz. SW 430: Research Methods in Social Work I

QMS102- Business Statistics I Chapter1and 3

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

Transcription:

Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani

Preface What is Statistics?

Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing data for the purpose of decision making/scientific discovery when the available information is both limited and variable.

Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing data for the purpose of decision making & scientific discovery when the available information is both limited and variable. Statistics is the science of Learning from Data

Preface Learning from Data

Preface Learning from Data Qualitative or quantitative attributes of a variable Data are typically the results of measurements or observations of a variable Data are often viewed as the lowest level of abstraction from which information and then knowledge are derived.

Preface Learning from Data Four-step process by which we can learn from data: 1. Defining the Problem 2. Collecting the Data 3. Summarizing the Data 4. Analyzing Data, Interpreting the Analyses, and Communicating the results

Preface Defining the Problem - Understanding the problem being addressed - Specifying the objective of the study - Identifying the variables of interest - Reviewing the previous studies - Brainstorming of the experts - Importance rating of the factors

Preface Data Gathering The most appropriate method to collect the data, should be selected. Data collection processes include:

Preface Data Gathering The most appropriate method to collect the data, should be selected. Data collection processes include: - Surveys Surveys are passive. The goal of the survey is to gather data on existing conditions, attitudes, or behaviors, without any interference.

Preface Data Gathering The most appropriate method to collect the data, should be selected. Data collection processes include: - Surveys - Experiments Experimental studies, tend to be more active. The person conducting the study, varies the experimental conditions to study the effect of the conditions on the outcome of the experiment.

Preface Data Gathering The most appropriate method to collect the data, should be selected. Data collection processes include: - Surveys - Experiments - Examination of existing data from business records, censuses, government records, and previous studies

Preface Using Surveys to gather data The manner in which the sample is selected from the population (sampling design) must be determined, so that the sample accurately reflects the population as a whole (representative sample) Population Sample

Preface What can affect representativeness of a sample? - Sampling Design - Nonresponse - Measurement Problems - Inability to recall answers to questions - Leading questions - Unclear wording of questions,

Preface Data Collection Techniques - Personal Interview - Telephone Interview - Self-Administered Questionnaire - Direct Observation

Preface Data Collection Techniques - Personal Interview - Telephone Interview - Self-Administered Questionnaire - Direct Observation Advantages: Higher Response Rate Disadvantages: Cost Trained Interviewer

Preface Data Collection Techniques - Personal Interview - Telephone Interview - Self-Administered Questionnaire - Direct Observation Advantages: Low Cost Easier to Monitor Interviewer Disadvantages: Unavailability of a list that closely corresponds to the population, People who screen calls before answering, Interview must be short

Preface Data Collection Techniques - Personal Interview - Telephone Interview - Self-Administered Questionnaire - Direct Observation Advantages: Cost Disadvantages: Low Response Rate Difficult to word questionnaire

Preface Data Collection Techniques - Personal Interview - Telephone Interview - Self-Administered Questionnaire - Direct Observation Advantages: Data not-affected by respondent Disadvantages: Possibility of error in observation Time-consuming

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Research objectives of the experiment - Selection of the factors that will be varied (treatments)

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Identification of extraneous factors that may be present in the experimental units or in the environment of the experimental setting (blocking factors)

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Characteristics to be measured on the experimental units (response variable)

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Method of randomization, either randomly selecting from treatment populations or the random assignment of experimental units to treatments

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Procedures to be used in recording the responses from the experimental units

Preface Using Experiments to gather data In experimental studies, researcher should follow a systematic plan (experiment plan) prior to running the experiment. The plan includes: - Selection of the number of experimental units assigned to each treatment may require designating the level of significance and power of tests or the precision and reliability of confidence intervals

Preface Experimental Designs - Completely Randomize Design - Randomized Block Design - Latin Square Design - Factorial Treatment Structure

Preface Experimental Designs - Completely Randomize Design When comparing t treatments (t levels of a single factor) An SRS sample of observations for each treatment Sample size for each treatment can be different Comparing Tire Wear for 4 Brands of Tire Car 1 Car 2 Car 3 Car 4 Brand B Brand A Brand A Brand D Brand B Brand A Brand B Brand D Brand B Brand C Brand C Brand D Brand C Brand C Brand A Brand D Training Workshop on Statistical Data Analysis 8-21 July 2011 Afsaneh Yazdani

Preface Experimental Designs - Completely Randomize Design - Randomized Block Design (similar to stratified random sample) When comparing t treatments Each treatment is assigned to a block (homogenous group) Comparing Tire Wear for 4 Brands of Tire Car 1 Car 2 Car 3 Car 4 Brand A Brand A Brand A Brand A Brand B Brand B Brand B Brand B Brand C Brand C Brand C Brand C Brand Training D Brand Workshop D Brand on Statistical D Brand Data D Analysis 8-21 July 2011 Afsaneh Yazdani

Preface Experimental Designs - Completely Randomize Design - Randomized Block Design (similar to stratified random sample) - Latin Square Design When comparing t treatments Having two blocking variables Comparing Tire Wear for 4 Brands of Tire Position Car 1 Car 2 Car 3 Car 4 Right/Front Brand A Brand B Brand C Brand D Left/Front Brand B Brand C Brand D Brand A Right/Back Brand C Brand D Brand A Brand B Left/Back Training Brand Workshop D Brand on A Statistical Brand B Data Brand Analysis C 8-21 July 2011 Afsaneh Yazdani

Preface Experimental Designs - Completely Randomize Design - Randomized Block Design (similar to stratified random sample) - Latin Square Design - Factorial Treatment Structure several factors rather than just t levels of a single factor whether or not interaction exists

Major branches of Statistics: - Descriptive Statistics the set of measurements available is frequently the entire population making sense of the data by reducing a large set of measurements to a few summary measures which reflect good picture of the whole

Major branches of Statistics: - Descriptive Statistics - Inferential Statistics Measurements of all units in the population are not available. A sample is drawn to draw conclusions about the population

Major branches of Statistics: - Descriptive Statistics - Inferential Statistics

Major Methods of Data Describing: - Graphical Techniques - Numerical Descriptive Techniques

Describing Data on a Single Variable: Graphical Methods - Categorized/Nominal/Ordinal Measurement - Continuous Measurement

Graphs 1. Pie Chart: Used to display the percentage of the total number of measurements falling into each category of the variable by partitioning a circle (similar to slicing a pie).

Graphs Guidelines for Constructing Pie Charts: 1. Choose a small number of categories for the variable because too many make the pie chart difficult to interpret. 2. Whenever possible, sort the percentages in either ascending or descending order.

Graphs 2. Bar Chart: Used to display the frequency or value of measurements of each category of variable (sometimes across time)

Graphs Guidelines for Constructing Bar Charts: 1. Label values on one axis and categories of the variable on the other axis. 2. The height of the bar is equal to the frequency or value. 3. Leave a space between each category for distinction.

Graphs 3. Histogram: Frequency/Relative Frequency Histogram, applicable only to quantitative/continuous measurements, used to show the distribution of a variable

Graphs Guidelines for Constructing Histograms: 1. Divide the range of the measurements by the approximate number of class intervals desired. (generally 5-20), then round it to get common width for class intervals.

Graphs Guidelines for Constructing Histograms: 1. Divide the range of the measurements by the approximate number of class intervals desired. (generally 5-20), then round it to get common width for class intervals. Too small number, can hide most of the patterns or trends in the data

Graphs Guidelines for Constructing Histograms: 2. Choose the first class interval so that it contains the smallest measurement 3. Construct the frequency/ relative frequency table

Graphs Guidelines for Constructing Histograms: 4. Construct the histogram based on the frequency/ relative frequency of class intervals.

Graphs Guidelines for Constructing Histograms: 4. Construct the histogram based on the frequency/ relative frequency of class intervals. each rectangle is constructed over each class interval with a height equal to the class frequency or relative frequency

Graphs 4. Stem-and-leaf Plot: A clever, simple device for constructing a histogram-like picture of a frequency distribution

Graphs Guidelines for Constructing Stem-and-leaf Plot: 1. Split each score or value into two sets of digits. The first or leading set of digits is the stem and the second or trailing set of digits is the leaf. 2. List all possible stem digits from lowest to highest. 3. For each score in the mass of data, write the leaf values on the line labeled by the appropriate stem number.

Graphs Guidelines for Constructing Successful Graphics: Before constructing a graph, set your priorities. Choose the appropriate type of graph. Pay attention to the titles. Use the colors cleverly.

Describing Data on a Single Variable: Numerical Measures Numerical descriptive measures are commonly used to convey a mental image of pictures, objects, and other phenomena. The two most common numerical descriptive measures are: - Measures of central tendency (location/position) - Skewness - Measures of variability

Measures of Central Tendency Describe center of the distribution of measurements Mode Mode of a set of Measurements is defined to be the measurement that occurs most often (with the highest frequency). Distributions can be Unimodal or bi-modal Mode of grouped data Modal Interval is the interval with the highest frequency. Mode is the mid-point of the modal interval

Measures of Central Tendency Describe center of the distribution of measurements Median The median of a set of measurements is defined to be the middle value when the measurements are arranged from lowest to highest. The median for an even number of measurements is the average of the two middle values. Median of grouped data L: Lower limit of the class interval containing the median n: Total frequency cfb: Cumulative frequency for all class intervals before median class interval fm: Frequency of class interval containing the median w: interval width med. = L + w 0.5n cfb fm

Measures of Central Tendency Describe center of the distribution of measurements Mean The arithmetic mean or mean of a set of measurements is defined to be the sum of measurements divided by the total number of measurements. The population (sample) mean is denoted by (y) μ = N i=1 N Y i, y = n i=1 n y i Mean of grouped data y i : Mid-point of the i-th class interval f i : Frequency of the i-th class interval n: Total number of measurements k ( f i i=1 ) y = k i=1 n f i y i Large number of class intervals, closer to actual mean.

Measures of Central Tendency Describe center of the distribution of measurements Geometric Mean Geometric mean of a set of measurements is: n G = (X 1 X 2 X n ) Appropriate for ratios, Only for data sets containing positive observations Harmonic Mean Harmonic mean of a set of measurements is: n H = 1 + 1 + + 1 X 1 X 2 X n Appropriate for rates of changes Only for data sets containing positive observations

Measures of Central Tendency Major Characteristics of Mode 1. It is the most frequent or probable measurement in the data set. 2. There can be more than one mode for a data set. 3. It is not influenced by extreme measurements. 4. Modes of subsets cannot be combined to determine the mode of the complete data set. 5. For grouped data its value can change depending on the categories used. 6. It is applicable for both qualitative/quantitative data.

Measures of Central Tendency Major Characteristics of Median 1. It is the central value; 50% of the measurements lie above it and 50% fall below it. 2. There is only one median for a data set. 3. It is not influenced by extreme measurements. 4. Medians of subsets cannot be combined to determine the median of the complete data set. 5. For grouped data, its value is rather stable even when the data are organized into different categories. 6. It is applicable to quantitative data only.

Measures of Central Tendency Major Characteristics of Mean 1. It is the arithmetic average of the measurements in a data set. 2. There is only one mean for a data set. 3. Its value is influenced by extreme measurements; trimming can help to reduce the degree of influence. (50% trimmed mean is the median) 4. Means of subsets can be combined to determine the mean of the complete data set. 5. It is applicable to quantitative data only.

Measures of Central Tendency Major Characteristics of Mean 1. It is the arithmetic average of the measurements in a data set. 2. There is only one mean for a data set. Measures of central tendency do not 3. Its value is influenced by extreme measurements; trimming provide can help a to complete reduce the picture degree of of the influence. (50% trimmed frequency mean distribution the median) for a set of 4. Means of subsets can measurements. be combined to determine the mean of the complete data set. 5. It is applicable to quantitative data only.

Skewness Skewness is: How mode, median, and mean are related for a set of measurements? Mode = Median = Mean

Skewness Skewness is: How mode, median, and mean are related for a set of measurements? Mode < Median < Mean Mean < Median < Mode

Skewness Skewness is: How mode, median, and mean are related for a set of measurements? m 3 = n i=1 y i y 3 n 1 standardized skewness measure = m 3 s 3

Measures of Variability Describe variability of measurements Range The range of a set of measurements is defined to be the difference between the largest and the smallest measurements of the set. - Easy to compute - Sensitive to outliers - Does not give much information about the pattern of variability Range of grouped data The difference between the upper limit of the last interval and the lower limit of the first interval.

Measures of Variability Describe variability of measurements Percentiles The p-th percentile of a set of n measurements arranged in order of magnitude is that value that has at most p% of the measurements below it and at most (100 p)% above it. 25th, 50th, and 75th percentiles, often called: the lower quartile, the middle quartile (median), and the upper quartile, respectively Percentiles of grouped data L: Lower limit of the class interval containing the percentile n: Total frequency cfb: Cumulative frequency for all class intervals before percentile class interval fm: Frequency of class interval containing the percentile w: interval width P 90 = L + w 0.9n cfb fm

Measures of Variability Describe variability of measurements Interquartile Range The interquartile range (IQR) of a set of measurements is defined to be the difference between the upper and lower quartiles; that is, IQR=75 th percentile - 25 th percentile IQR=3 rd quartile 1 st quartile Interquartile Range 1. Less Sensitive to extreme measurements 2. Misleading when data concentrates about the median 3. Covers only the variability of 50% of measurements 4. Less useful for a single set of measurements, quite useful for comparing variability of two or more data sets

Measures of Variability Describe variability of measurements Variance The variance of a set of n measurements y 1, y 2,. y n with mean y is : s 2 = n i=1 y i y 2 n 1 The population (sample) variance is denoted by σ 2 (s 2 ) Variance of grouped data y i : Mid-point of the i-th class interval f i : Frequency of the i-th class interval n: Total number of k measurements ( i=1 f i ) n s 2 i=1 f i y i y 2 n 1 Large number of class intervals, closer to actual sample variance

Measures of Variability Describe variability of measurements Standard Deviation The standard deviation of a set of measurements is defined to be the positive square root of the variance It yields a measure of variability having the same units of measurements as the original data. Standard Deviation is appealing because: We can compare the variability of two or more sets of data using the standard deviation

Measures of Variability Describe variability of measurements Standard Deviation The standard deviation of a set of measurements is defined to be the positive square root of the variance It yields a measure of variability having the same units of measurements as the original data. Standard Deviation is appealing because: We can use the results of the Empirical Rule that follows to interpret the standard deviation of a single set of measurements

Measures of Variability Empirical Rule: Given a set of n measurements possessing a moundshaped histogram, then: the interval y ± s contains approximately 68% of the measurements the interval y ± 2s contains approximately 95% of the measurements the interval y ± 3s contains approximately 99.7% of the measurements.

Measures of Variability Empirical Rule: Given a set of n measurements possessing a moundshaped histogram, then: the interval y ± s contains approximately 68% of the measurements the interval y ± 2s contains approximately 95% of the measurements Approximate value for s : the interval y ± 3s contains approximately 99.7% s = Range of the measurements. 4 (better to overestimate)

Measures of Variability Describe variability of measurements Coefficient of Variability In a process or population with mean and standard deviation, the coefficient of variation is defined as: CV =, provided μ 0, μ sometimes expressed as a percentage CV = 100 σ % μ CV is unit-free so it can be used to: Compare the variability in two considerably different processes or populations. If CV is 15%, the standard deviation of the population is 15% of its mean

Measures of Variability Describe variability of measurements Coefficient of Variability In a process or population with mean and standard deviation, the coefficient of variation is defined as: CV = μ, provided μ 0, sometimes expressed as a percentage CV = 100 s μ % CV is unit-free so it can be used to: as an index of population variability i.e. populations with similar CVs, have similar variability

Measures of Variability Summary Values Required for minimal description: Minimum, Lower Quartile, Median, Higher Quartile, Maximum

5. Boxplot: Range Interquartile Range Minimum Q1 Median Q3 Maximum

5. Boxplot: With a quick glance, it gives an impression about: 1. The lower and upper quartiles, Q1 and Q3 2. The interquartile range (IQR) 3. The most extreme (lowest and highest) values 4. The symmetry or asymmetry of the distribution Lower Upper Inner Fence Q1-1.5 IQR Q3+1.5 IQR Outer Fence Q1-3 IQR Q3+3 IQR

from More Than One Variable: Graphs and Correlation - Contingency table - Graphs - Correlation Coefficient

from More Than One Variable: Contingency Table Cross-tabulation to develop percentage comparisons to be used to describe the relationship between two qualitative variables.

from More Than One Variable: Graphs - Stacked bar graph - Cluster bar graph - Scatter plot - Side-by-side boxplots

from More Than One Variable: Graphs - Stacked bar graph - Cluster bar graph - Scatter plot - Side-by-side boxplots Extension of bar chart, for a pair of qualitative variables 45 34.6 46.9 46.9 38.6 38.6 90 27.4 27.4 2002 2003 2002 West East North

from More Than One Variable: Graphs - Stacked bar graph - Cluster bar graph - Scatter plot - Side-by-side boxplots Extension of bar chart, for a single quantitative and a qualitative variable 90 46.9 45 38.6 34.6 27.4 2002 2003 East West North

from More Than One Variable: Graphs - Stacked bar graph - Cluster bar graph - Scatter plot - Side-by-side boxplots Visual assessment of relationship between two quantitative variables

from More Than One Variable: Graphs - Stacked bar graph - Cluster bar graph - Scatter plot - Side-by-side boxplots Visual assessment of distribution of quantitative variables.

from More Than One Variable: Correlation Coefficient The correlation coefficient measures the strength of the linear relationship between two quantitative variables. The correlation coefficient is usually denoted as r.

from More Than One Variable: Correlation Coefficient r = 1 n 1 n i=1 x i x s x y i y s y

from More Than One Variable: Correlation Coefficient Properties - Value of r is a number between -1 and 1. Closer value to ±1 means stronger association between variables. - A positive (negative) value for r indicates a positive (negative) association between the two variables.

from More Than One Variable: Correlation Coefficient Properties - r is a unit-free measure of association. - r measures the degree of straight line relationship between two variables not a curved relationship, no matter how strong the relationship is.