Descriptive Statistics Lecture

Similar documents
MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

Business Statistics Probability

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Still important ideas

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Chapter 2--Norms and Basic Statistics for Testing

Lesson 9 Presentation and Display of Quantitative Data

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Choosing the Correct Statistical Test

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Variability. After reading this chapter, you should be able to do the following:

On the purpose of testing:

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Chapter 1: Introduction to Statistics

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Analysis and Interpretation of Data Part 1

Welcome to OSA Training Statistics Part II

9 research designs likely for PSYC 2100

STATISTICS AND RESEARCH DESIGN

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Elementary Statistics:

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

Psychologist use statistics for 2 things

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Chapter 3: Examining Relationships

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Distributions and Samples. Clicker Question. Review

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Six Sigma Glossary Lean 6 Society

Lesson 8 Descriptive Statistics: Measures of Central Tendency and Dispersion

Statistics is a broad mathematical discipline dealing with

1. Introduction a. Meaning and Role of Statistics b. Descriptive and inferential Statistics c. Variable and Measurement Scales

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Basic Statistics 01. Describing Data. Special Program: Pre-training 1

Psychology Research Process

Measuring the User Experience

Chapter 20: Test Administration and Interpretation

Statistics for Psychology

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Nature of measurement

Example The median earnings of the 28 male students is the average of the 14th and 15th, or 3+3

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Appendix B Statistical Methods

Basic Statistical Concepts, Research Design, & Notation

Undertaking statistical analysis of

PRINCIPLES OF STATISTICS

Chapter 1: Exploring Data

Unit 1 Exploring and Understanding Data

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

2.4.1 STA-O Assessment 2

Introduction to biostatistics & Levels of measurement

Chapter 7: Descriptive Statistics

full file at

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Observational studies; descriptive statistics

Psychology Research Process

An InTROduCTIOn TO MEASuRInG THInGS LEvELS OF MEASuREMEnT

Statistics and Epidemiology Practice Questions

Political Science 15, Winter 2014 Final Review

Part III Taking Chances for Fun and Profit

The normal curve and standardisation. Percentiles, z-scores

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

NORTH SOUTH UNIVERSITY TUTORIAL 1

VARIABLES AND MEASUREMENT

V. Gathering and Exploring Data

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Measurement and meaningfulness in Decision Modeling

Understanding and serving users

The Nature of Probability and Statistics

Types of questions. You need to know. Short question. Short question. Measurement Scale: Ordinal Scale

Variables and Data. Gbenga Ogunfowokan Lead, Nigerian Regional Faculty The Global Health Network 19 th May 2017

Never P alone: The value of estimates and confidence intervals

Statisticians deal with groups of numbers. They often find it helpful to use

Research Designs and Potential Interpretation of Data: Introduction to Statistics. Let s Take it Step by Step... Confused by Statistics?

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Introduction to Statistics

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

This is Descriptive Statistics, chapter 12 from the book Psychology Research Methods: Core Skills and Concepts (index.html) (v. 1.0).

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Available as a Powerpoint data file

Level of Measurements

Transcription:

Definitions: Lecture Psychology 280 Orange Coast College 2/1/2006 Statistics have been defined as a collection of methods for planning experiments, obtaining data, and then analyzing, interpreting and drawing conclusions based on the data (Triola, 1992, p.4). This definition illustrates the strong link between research methodology and statistics The research design is the foundation of a good study o statistic can fix an inferior research design Garbage In..Garbage Out Definitions: All types of statistics can be categorized into two subgroups: Descriptive statistics Describe large amounts of data in an abbreviated way Describe important characteristics of your data Inferential Statistics Goes beyond mere description Levels of Measurement Levels of measurement can be broken down into a hierarchy with four categories: ominal Ordinal Interval Ratio Lowest Level Highest Level Statistics are appropriate or not appropriate depending on the levels of measurement of your data Use sample data to draw conclusions and make inferences about a population ominal Level of Measurement ominal level of measurement consists of names, labels and categories. Subjects are classified or identified according to common characteristics. ominal data are not graded, ranked, or scaled in any manner. ominal measurement do not imply any quantity or direction. Examples: Gender, Ethnicity Ordinal Level of Measurement Ordinal level of measurement goes beyond mere classification and assigns order to the values of the variable. Used to order or rank order objects or events However, the limitation of ordinal measurement is that the distances between values on the scale or continuum may not be either meaningful or known. In other words, ordinal scales assign order, but do not have a standardized or replicable scale Examples: Rankings, such as the order runners complete a race 1

Interval Level of Measurement Interval level of measurement not only classifies and give us a rank order, but also gives us information about the distances between the ranks. The intervals between the numbers are the same. However, the limitation of interval measurement is that the scale does not have a non-arbitrary or meaningful zero point. Thus, direct ratio comparisons are not possible. In other words, interval scales assign order, and have a standardized scale, but do not have a meaningful zero. Examples: Temperature in Fahrenheit or Celsius. Ratio Level of Measurement Ratio level of measurement is the highest level of measurement. Therefore, these variables have all the characteristics: The scale has order A standard unit of measure, and A meaningful zero point Thus, direct ratio comparisons are possible Examples: Salary, umber of Children, Siblings The only difference between interval and ratio scales is the absolute zero point. They can be mathematically treated as equivalent! Why is it important to know this? Statistical technique used depends on the level of data measured ominal and ordinal data are discrete Measured in units that cannot be divided further When playing poker, how many aces can you have? Interval and ratio data are continuous Data can be broken down into an unlimited number of values The majority of traditional statistical operations require the data to be continuous Simply describing a set of data Can be grouped into four broad categories: Tables & graphs of data Measures of central tendency Measures of dispersion or variance Measures of position (percentile/quartile, z scores) Tables & graphs of data: Frequency distribution table of age categories Many different forms of tables and graphs can be created A frequency table is a list of all values found with the corresponding frequency; can display as count of numbers or percents Graphs have been called the ultimate form of descriptive statistics.a picture is worth a thousand words 2

Frequency distribution bar chart Consider an example in which we want to find out how a sample of 25 people respond to failure in an exam when asked to circle a number on the following scale: To what extent were external factors responsible for your failure? not at all 1 2 3 4 5 6 7 extremely Let s imagine their responses were as follows: 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7 What do we do with this data? List these numbers in full every time we talk about this experiment? Use statistics to characterize the important properties of these responses because that s what statistics are: numerical statements about the properties of a given set of data. One of the first things we might most want to say about these data is what constitutes a typical value. This is what measures of central tendency do. Measures of central tendency Describe the central scores in the data. These measures include: Mean The mean is the arithmetic average Median The median is the middle score when data are arranged in order Mode The mode is the most frequently occurring score Mean The mean is the average value (response) calculated by summing all the values (ΣX = 110) and dividing it by the number of values ( = 25). ΣX/ = 110/25 = 4.40 Mean (con t) Only appropriate when scores are on a interval or ratio scale Scientific abbreviation is M Also known as the arithmetic average X = Xi = X1 + X2 + X3 +. X mean of a sample µ = Xi mean of a population set of scores Where: Xi.. Xn = raw scores X (read X bar )=mean of a sample set of scores µ (read mew )=mean of a population set of scores (read sigma )=summation sign =number of scores 3

Median The middle score when numbers are arranged in order If all values are ranked from 1 to, the median is the (+1)/2 value. If there is an ODD number, the median is the middle number 2, 3, 4, 5, 5, 7, 7, 7, 8 Median = 5 If there is an EVE number, the median is the mean of the two middle numbers 2, 3, 4, 5, 5, 7, 7, 7, 7, 8 Median = 6 Scores must be on a ordinal, ratio or interval scale Scientific abbreviation is Mdn Mode Most frequently occurring score 8, 7, 2, 5, 3, 7, 4, 5, 7 The mode for this distribution is 7 A distribution can have more than one mode. This type of distribution is called multimodal. 8, 7, 2, 5, 3, 7, 4, 5, 7, 5 The modes for this distribution are 5 and 7 Only measure of central tendency that can be used with nominal data Used when only a very simple description of scores is needed (e.g., the modal age of a group of individuals). It is clear that these three statistics the mean, the median and the mode differ. But do they differ in a systematic way? Yes in fact their relationship to each other depends on the overall distribution of responses. In this example, the frequencies with which each scale point is selected can be represented in a frequency graph as follows. The shaded bars here are based on the actual data obtained from an experimental sample. The best estimate of the population s behaviour (based on sample data) is represented by the curved line. When a distribution is skewed the mean, median and mode will not all be the same. In cases of skew, generally the mean is extremitized in the direction of skew more than the median which is extremitized in the direction of skew more than the mode. tail of distribution mean median mode mean median mode mean median mode ow in fact we can see that the distribution of responses here is slightly skewed in fact it is negatively skewed because the distribution s tail extends further in the negative direction. positive skew symmetrical distribution negative skew 4

ormal Distribution In cases of skew, the median may be a more appropriate statistic to describe central tendency than the mean. This is because. the median falls between the mean and mode and the mean can be a very misleading statistic if it differs appreciably from the median or mode. In fact, of the various possible distributions, one form of symmetrical distribution the normal distribution (often called the bell curve) is the most common. For statistical purposes a normal distribution of responses is also the most desirable because it has lots of useful properties (as we will see in later lectures). Things like IQ and height are normally distributed. On the other hand, income and wealth are positively skewed (so governments often report mean income because it s higher than the median; i.e., what the average person earns). The ages of students in first-year university course are also positively skewed. So, if someone wanted to make students seem young, which measure of central tendency would they use? Self-evaluations tend to be negatively skewed. If someone wanted to emphasize the extent to which this sample was externalizing, which statistic would they use? Comparison of Central Tendency Measures Baseball Salaries 2004 Anaheim Angels (Selected Players) ame Vladimir Guerrero Bartolo Colon Troy Glaus Darin Erstad Kelvin Escobar Adam Kennedy David Eckstein Shane Halter Jeff DaVanon John Lackey Francisco Rodriguez Jose Molina Chone Figgins Total Salary (Sum) Mean Median Mode Salary $11,000,000 $11,000,000 $9,900,000 $7,500,000 $5,750,000 $2,500,000 $2,150,000 $575,000 $375,000 $375,000 $375,000 $335,000 $320,000 $52,155,000.00 $4,011,923.08 $2,150,000.00 $375,000.00 All Salaries $100,534,667.00 $3,723,506.19 $2,150,000.00 $375,000.00 Source: USA Today (2004 Season) What s Happening? The mean is being pulled towards the extreme high scores This is inflating the average and misrepresenting the data The median is a better choice for a measure of central tendency when scores are known to vary in extreme The mode may not be a useful measure for data that is continuous in nature Measures of Dispersion Determine how much variability exists in a set of scores Range Standard Deviation Variance 5

Range The range is the total number of units between the highest and lowest scores Range = highest score lowest score Easy to calculate but gives only a relatively crude measure of dispersion Only measures the spread of the two extreme scores and not the spread of the scores in between 8, 7, 2, 5, 3, 7, 4, 5, 7 Range = 8 2 = 6 Standard Deviation The average deviation of scores from the mean The standard deviation is SMALL when when the majority of scores are around the mean The standard deviation ICREASES as more scores lie further from the mean score The standard deviation (symbolized as s) is derived by first calculating the variance (symbolized as s²) It is the square root of the variance Similar to the mean, only appropriate for interval and ratio level scores Standard Deviation Gives us a measure of dispersion relative to the mean Is sensitive to each score in the distribution Like the mean, the standard deviation is stable with regard to sampling fluctuations Both the mean and standard deviation can be manipulated algebraically and allows them to be used in inferential statistics calculations Standard Deviation Standard deviation tells us a lot about a distribution, particularly if that distribution is normally distributed. If this is the case, we know for example, that about 68% of all values will fall within 1 SD of the mean, 95% fall within 2 SDs and 99% fall within 3 SDs. As we will see in later lectures, this type of information is very useful to know. The Principles of Variance The basic concept is to measure how scores vary about the mean So logic says, take each score and subtract from the mean, then sum the difference and make an average But what happens? The sum will always be ZERO! To fix this, square the difference, then average. Thus, you ve arrived at the variance and it s a squared unit of measurement The main problem with variance as a measure of dispersion is that it is not given in units of X but in units of X 2. So if we were measuring height, variance would be in units of height 2 (i.e., area) Our fix is the standard deviation..take the square root of the variance. Variance & Standard Deviation Here s the statistical formula for variance: S x ² = X² -( X)² -1 Here s the statistical formula for standard deviation: S x = S x ² You might ask why divide by rather than -1? Sample data tend to under-estimate population variance, we need to make a slight adjustment to this statistic. We do this by dividing the numerator of the equations for variance and standard deviation by 1 rather than. 6

Questions? 7