Ch. 1 Collecting and Displaying Data

Similar documents
Chapter 1: The Nature of Probability and Statistics

Chapter 1: Introduction to Statistics

Statistics Mathematics 243

Ch 1.1 & 1.2 Basic Definitions for Statistics

Sta 309 (Statistics And Probability for Engineers)

Introduction to Statistics

Chapter 5: Producing Data Review Sheet

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Chapter 4 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Math Workshop On-Line Tutorial Judi Manola Paul Catalano. Slide 1. Slide 3

The Nature of Probability and Statistics

Math Workshop On-Line Tutorial Judi Manola Paul Catalano

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Chapter 7: Descriptive Statistics

Introduction to biostatistics & Levels of measurement

THE ROLE OF THE COMPUTER IN DATA ANALYSIS

Chapter 1 Data Collection

Basic Concepts in Research and DATA Analysis

Chapter 1 - The Nature of Probability and Statistics

Vocabulary. Bias. Blinding. Block. Cluster sample

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Creative Commons Attribution-NonCommercial-Share Alike License

Chapter 1 Overview. Created by Tom Wegleitner, Centreville, Virginia. Copyright 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

AP Statistics Chapter 5 Multiple Choice

Comparing Different Studies

full file at

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Unit 7 Comparisons and Relationships

Statistics are commonly used in most fields of study and are regularly seen in newspapers, on television, and in professional work.

Do Now Prob & Stats 8/26/14 What conclusions can you draw from this bar graph?

Frequency Distributions

Frequency Distributions

Chapter 01 What Is Statistics?

7) A tax auditor selects every 1000th income tax return that is received.

REVIEW PROBLEMS FOR FIRST EXAM

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

Arizona Youth Tobacco Survey 2005 Report

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

Sampling. (James Madison University) January 9, / 13

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

The Nature of Probability and Statistics

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

MATH& 146 Lesson 4. Section 1.3 Study Beginnings

What Is Statistics. Chapter 01. Copyright 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Types of Variables. Chapter Introduction. 3.2 Measurement

CHAPTER 1 SAMPLING AND DATA

Transforming Concepts into Variables

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

Data collection, summarizing data (organization and analysis of data) The drawing of inferences about a population from a sample taken from

Introduction: Statistics, Data and Statistical Thinking Part II

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Increasing the Cigarette Tax Rate in Wyoming to Maintain State Programs: An AARP Survey

AP Stats Review for Midterm

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Chapter 2--Norms and Basic Statistics for Testing

Math 124: Module 3 and Module 4

1. If a variable has possible values 2, 6, and 17, then this variable is

Margin of Error = Confidence interval:

Tobacco Use Percent (%)

MATH-134. Experimental Design

Sampling and Data Collection

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

Introduction to Statistics

Test Bank for Privitera, Statistics for the Behavioral Sciences

Still important ideas

Statistics Success Stories and Cautionary Tales

Measurement. Different terminology. Marketing managers work with abstractions. Different terminology. Different terminology.

Math 140 Introductory Statistics

Name Class Date. Even when random sampling is used for a survey, the survey s results can have errors. Some of the sources of errors are:

Unit 3 Lesson 2 Investigation 4

Quantitative Literacy: Thinking Between the Lines

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

AP Statistics Exam III Multiple Choice Questions

Design, Sampling, and Probability

TOPIC: Introduction to Statistics WELCOME TO MY CLASS!

Sampling Reminders about content and communications:

Chapter 1: Collecting Data, Bias and Experimental Design

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Chapter 3: Examining Relationships

Measuring the User Experience

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

STP226 Brief Class Notes Instructor: Ela Jackiewicz

6 Relationships between

Director, The Field Poll (916)

Intelligence. PSYCHOLOGY (8th Edition) David Myers. Intelligence. Chapter 11. What is Intelligence?

Test 1 Version A STAT 3090 Spring 2018

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Probability and Statistics Chapter 1 Notes

CAMPAIGN FOR TOBACCO-FREE KIDS SURVEY OF WASHINGTON VOTERS Raising the Age of Tobacco Sales DECEMBER 2015.

What Is Statistics. Chapter 1

Homework #2 is due next Friday at 5pm.

Elaboration for p. 20:

Transcription:

Ch. 1 Collecting and Displaying Data In the first two sections of this chapter you will learn about sampling techniques and the different levels of measurement for a variable. It is important that you learn the vocabulary in these sections because these terms will be used throughout the rest of the book. The vocabulary in section 1.2 is especially important, because it is necessary for determining which graph is appropriate for your data (Section 1.3) and which statistical tests can be applied to your data (Chapters 3 and 4). In the last two sections of this chapter, you will learn how to use a spreadsheet in Microsoft Excel to organize your data, make graphs of your data, and perform statistical calculations using your data. 1.1 Sampling Techniques The population of a research study is the set of all subjects (human or otherwise) that are being studied. It is often the group of people you hope would benefit from your research (for example: saxophone players, elementary school children, choral students, etc.). The sample of a research study is a subset of the population that you select and collect data from. Thus, when you are conducting research in music education, your population might be all elementary school students in a music class. Your sample is likely to be the students in your classroom or in your school district. Since it would be almost impossible to study the entire population of music students, it is necessary to study a sample of the population. In Chapter 2 we will address the issue of how to apply the results from our sample to the population. Below are six common sampling techniques: Sampling Method Example Pro s and Con s Random Sampling is when the sample was chosen using chance methods or random numbers (randomly selected). Each member of the population must have an equal chance of being selected in order for it to be random sampling. A researcher assigned a number to every person who was registered to vote in Franklin County. Then, she used a computer to generate random numbers and used those to select her sample. In large research studies, researchers often perform Multistage Sampling where they begin with a large population, and in several stages, use different sampling techniques to narrow down the population in order to form a sample. Double Sampling consists of randomly selecting a large 1 st sample, collecting data, and then using the data to select a smaller 2 nd sample from the 1 st sample. The 2 nd sample is then used for a research study. A researcher divides the country into six geographic regions. From each region, she randomly selects five major cities, and from those five major cities, she randomly selects individuals using phone numbers. A marketing company was hired to find out what music was most popular with 12 16 year olds. The company surveyed a large number of children. From the data, they selected all of the children age 12 16 who (1) regularly buy music, (2) listen to music at least three hours/day, and (3) have parents whose income is over $50,000 per year. From this group, they randomly selected 100 children to participate in the study. This sampling method gives the most accurate results. However, using this method with a large population is often impossible. Usually, we do not have access to a complete list of the members of a population. For large populations where random sampling is impossible, this is probably the most accurate technique. However, Multistage Sampling requires lists of subpopulations and is expensive, so it is usually used in large studies done by corporations, academic institutions, and government organizations. This sampling technique is accurate, but it requires a large 1 st sample and is expensive and time consuming. The 1 st sample must be randomly selected from the target population. In this example, one could hire a marketing company with access to sales lists from music retailers.

Sampling Method Example Pro s and Con s Cluster Sampling is when the sample is created from intact groups that are randomly selected. When using this method, there are two important rules that must be followed: (1) the intact groups must be randomly chosen (2) every person in each intact group must be included in the sample. A researcher was studying calculator use in high school math classes in the state of Michigan. Twenty high schools were randomly selected (these are the clusters the intact groups) and then every student taking a math class in one of those schools was surveyed. Stratified Sampling is when the population is divided into groups that are important to the study and then individuals from each group are randomly selected. In other words, the sample is constructed to represent each of these groups. Convenience Sampling is when the researcher selects the sample that is most convenient. A medical researcher divided the population into four categories: male smokers, male non-smokers, female smokers, and female non-smokers. Twenty persons from each category were randomly selected to be in the study. A researcher received permission from five colleagues to use the students in their music classes for a research study. This is a more cost-effective way to sample a large population. However, each intact group is going to have a smaller amount of variation than the population. In this example, you need to select enough schools so that you reduce the possibility of over representing (or under representing) schools in wealthy neighborhoods. While the sample does a good job representing the population on the characteristics chosen (in this case gender and smoking), it may not represent the population regarding other characteristics. If you are not studying smoking, this is a bad sample because smokers make up 50% of the sample, but only 21% of the U.S. population. Kaufman (2006) While this method adds more sampling error than the first four methods listed above, it is often the only method possible. A common error made by many students is to conflate the concepts of random selection and random assignment. The example below will illustrate the difference. Example: A music teacher is performing research at her school. There are two sections of 1 st grade classes, so she flips a coin to decide which section of 1 st grade is the control group and which section is the experimental group. This is a case of random assignment, but not random selection. She did not randomly select the school and did not randomly select the students to be in the class; thus, it is not a random sample. The sample was chosen by convenience, and then each group in the experiment was randomly assigned. Biased Samples When we perform our research, no matter which method we use to select our sample, there will always be an error when we use the results obtained from our sample to make claims about the population. This is because the sample is not exactly the same as the population. Thus, we must do our best to select a sample that is representative of the population. Samples are said to be biased samples if some type of systematic error has been made when selecting the sample. Any results calculated from a biased sample cannot be applied to the population with any specific accuracy. This is because we cannot calculate the probability that the sample will be significantly different from the population due to random chance alone. This is a very important point. When we select a sample randomly, we might (by chance alone) end up with a sample that differs from the population according to some trait that is relevant to our research study. For example, you could, by random chance alone, end up with a situation where the average I.Q. of your sample is much higher than the average I.Q. of the population. However, if the sample was chosen randomly, we can calculate the probability of this occurring by chance alone, and account for this in our statistical analysis.

Now suppose that for convenience, we selected our sample on a street corner near the campus of M.I.T. There is a much greater chance that the average I.Q. of our sample is higher than the average I.Q. of the U.S. population. Thus, the difference in I.Q. between the sample and the population is not based on random chance alone. A sample is supposed to represent the population in a research study. If the population we are studying is U.S. residents, and I.Q. is relevant to our study, then our sample collected near the campus of M.I.T. is biased and will not represent the population. However, if the population we are studying is M.I.T. undergraduate students, then our sample should contain only M.I.T. undergraduate students (that were randomly selected). In this case, selecting M.I.T. undergraduates that pass by a corner near the campus would not result in a biased sample because we can calculate the probability that our sample is different from our population due to random chance alone. We will discuss why it is important that you be able to calculate this probability in Chapter 2, when we discuss in detail how you can apply the results collected from your sample to the population. Here is another example of a biased sample: A researcher would like to determine what percentage of registered voters favor reinstating prohibition on alcohol. The researcher goes to Cheers Bar and conducts a survey to count the percentage of registered voters who favor prohibition. Even though 100% of the researcher s sample opposed prohibition, that does not mean that 100% of the population opposes prohibition because the sample does not represent the population. In fact, randomly selecting people on the sidewalk outside of Cheers Bar would still result in a biased sample because some of those individuals would be in that location because they are either coming from or going to the bar. The two previous examples illustrate samples that are clearly biased and do not represent the population; however, in practice, researchers do not make mistakes as obvious as the ones in the examples. There are more subtle ways for a sample to be biased, and we will discuss these in more detail in Chapters 2 and 4. For now, we will conclude by mentioning a sampling method that is sometimes used in the media, but is rarely used in research because it can lead to biased samples. A popular, but statistically problematic, method of sampling is called the Self-Selected Sampling Method. This is when the sample is constructed from volunteers. One example is an internet, TV, or radio poll where the audience may vote or give their opinion. One of the biggest problems with self-selected samples is that they are more likely to contain individuals who feel passionate about the issue (on either side) and less likely to contain individuals who have an opinion, but are less passionate about the issue. Thus, the sample is not representative of the population. In addition, samples like the ones mentioned above only contain persons who visit a particular website, watch a particular TV show, or listen to a particular radio station. 1.2 Levels of Measurement for Variables Once you have selected your sample for your research, you are going to want to collect data from the sample and record the results. Researchers will often collect data by asking questions on surveys, measuring properties (such as blood pressure or vocal range), or by getting results from tests (such as the Scholastic Aptitude Test or the Iowa Test of Music Literacy). Storing this data in a spreadsheet makes it easier to perform statistical calculations. Below is an example where a researcher collected data from her sample and stored the results in a spreadsheet. (Each person in the study was given a number so that no individual could be identified).

Variables Subjects in the Sample Person Eye Color T-Shirt Size I.Q. Age 1 Dark Brown Medium 100 20.25 2 Light Brown Large 105 23.75 3 Blue Medium 100 26.33 4 Dark Brown Extra-Large 110 21.17 5 Dark Brown Large 95 24.66 6 Dark Brown Large 100 25.75 7 Blue Medium 120 29.83 8 Light Brown Extra-Large 100 22.25 9 Green Small 95 25.66 10 Dark Brown Large 90 21.33 Values for the variable T-Shirt Size Values for the variable I.Q. Each heading (Eye Color, T-Shirt Size, I.Q., and Age) is a variable and the data listed under each heading are the values for the variable. Notice that each variable has a different type of values one has colors, one has sizes, one has numbers that are integers, and one has numbers with decimals. This is because each variable has a different level of measurement. It is important that you learn about the different levels of measurement for a variable, because this information will be needed in the next section as well as throughout Chapter 3 and Chapter 4. There are four commonly used levels of measurement for a variable. Nominal Ordinal Interval Ratio A variable whose values are categories that have no meaningful ranking has a Nominal Level of Measurement. Qualitative variables (also called categorical variables) have a nominal level of measurement. Eye Color has a nominal level of measurement since the values are categories that do not have a meaningful ranking. A variable has an Ordinal Level of Measurement if its values have a meaningful ranking, but no precise measure between the values exists. T-Shirt Size has an ordinal level of measurement since the values can be ranked, but we do not know the precise difference between each size. In other words, we know a medium is bigger than a small, but we do not know how much bigger. A variable has an Interval Level of Measurement if (1) its values have a meaningful ranking, (2) precise measurements between the values exist, and (3) there is no meaningful zero. I.Q. has an interval level of measurement because you cannot measure zero intelligence. However, precise measurements between the values exist since the difference between an I.Q. of 100 and 110 is the same as the difference between an I.Q. of 150 and 160. A variable has a Ratio Level of Measurement if (1) its values have a meaningful ranking, (2) precise measurements between the values exist, and (3) there is a meaningful zero. Age has a ratio level of measurement we can define zero as the official time of birth. Qualitative variables have values that can be placed into distinct categories based upon some characteristic or attribute. They are sometimes called categorical variables. Qualitative variables have a nominal level of measurement. Quantitative variables are variables whose values can be ranked from smallest to largest. Quantitative variables can have an ordinal, interval, or ratio level of measurement. In the previous example, the variable Eye Color has a nominal level of measurement since there is no relevant ranking for the values blue, green, and brown. Eye color consists of categories rather

than rankings on a scale from smallest to largest, which means that Eye Color is a qualitative variable. Even if you assigned the numbers 1, 2, and 3 to blue, green, and brown, Eye Color would still have a nominal level of measurement, because it would be meaningless to say that green is greater than blue even though 2 is greater than 1. The variable T-Shirt Size has values that have a relevant ranking: small, medium, large, X- large. However, the difference in size between a small and a medium is not precise or consistent across all manufacturers. Also, the difference in size between a small and a medium may not be in proportion with the difference in size between a medium and a large. Thus, there is no precise difference between each size, which means that it has an ordinal level of measurement. Another way to think of it is: we know a medium is bigger than a small, but we do not know how much bigger. The variable I.Q. has an interval level of measurement because you cannot measure zero intelligence. However, precise measurements between the values exist since the difference between an I.Q. of 100 and 110 is the same as the difference between an I.Q. of 150 and 160. Because there is no meaningful zero, one cannot make meaningful ratios. For example, it does not make sense to say that someone with an I.Q. of 100 is twice as smart as someone whose I.Q. is 50. In this case, an I.Q. of 100 means the person has an average level of intelligence, but an I.Q. of 50 would mean the person was non-verbal, unable to take care of oneself, and would require 24-hour supervision. Temperature in degrees Fahrenheit is another variable with an interval level of measurement. Precise measurements between the values exist, but 0º F does not mean there is no heat. Thus, one cannot make meaningful ratios with temperature. For example, it is not the case that a temperature of 10º F is twice as hot as a temperature of 5º F. The variable Age has a ratio level of measurement because (1) ages can be ordered from youngest to oldest, (2) there are precise measurements between the values, and (3) it has a meaningful zero the official time of birth. Also, you can form ratios. It is true to say that someone who is 16 years old is twice as old as someone who is 8 years old (they have been alive twice as long). The variable Annual Income also has a ratio level of measurement because (1) Income can be ranked from smallest to largest, (2) there are precise measurements between the values, and (3) it is possible to have $0 as an annual income. Also, you can form ratios. A person who makes $60,000 per year makes one and a half times as much money as someone who makes $40,000 per year. Continuous and Discrete Variables A variable is continuous if given any two of its values v1 and v 2, all of the numbers between v1 and v 2 are possible outcomes for the variable (i.e.: the variable does not skip over any numbers). One continuous variable is Height. Not only is it possible to be 60 inches tall or 61 inches tall, it is possible to be any height between 60 and 61 inches tall such as 60.37 inches. A variable is discrete if it is not continuous. That is, a discrete variable does skip over values. One discrete variable is Class Size. You could have 0, 1, 2, 3, 4, students in your class, but you could not have 3½ students in your class, so Class Size is not continuous. Note: Variables with an ordinal level of measurement are always discrete. Section 1.2 Levels of Measurement for Variables 7 Example 1: Suppose you have a rating scale that only has the values 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0. Even though there are decimals, this is still a discrete scale because the value 1.25 is not possible (it is skipped over ). Example 2: The variable Average Class Size is continuous even though the variable Class Size is discrete. You cannot have a class size of 25.3 students, but you can have an average class size of 25.3 students (or any other fraction).

Example 3: Example 4: Example 5: Suppose you measure the variable Height using the intervals (groups): less than 51 inches, 51 to 55 inches, 56 to 60 inches, 61 to 65 inches, and greater than 65 inches (measurements are rounded). In this case, the variable Height is discrete, not continuous, because there are five discrete groups with no possible measurements between consecutive groups. A continuous variable has infinitely many outcomes. If a discrete variable has a huge number of outcomes, we can treat the variable as if it is continuous. The variable Annual Income is technically discrete since 0.01 is the smallest unit of money and you cannot earn half of a penny; however, we usually treat this variable as if it is continuous since there are literally millions of values this variable can have. Also, even though we round-off measurements like height to the nearest millimeter, we can treat the variable as if it is continuous. Suppose we use intervals for Annual Income such as: less than $50,000 per year, $50,000 to $100,000 per year, $100,000 to $150,000 per year, and over $150,000 per year. In this case, the variable Annual Income would be discrete and have only four outcomes. Special Problems with the Nominal Level of Measurement Even in quantitative research we often use qualitative variables. For example, we might want to compare the average income of male lawyers to that of female lawyers. This would be quantitative research and Average Income is a quantitative variable (ratio and continuous); however the variable Gender is a qualitative variable with a nominal level of measurement since the variable s values are the categories male and female. Whenever you use a qualitative variable, you have to be careful when measuring the subject. The values of a qualitative variable are categories. These categories must be mutually exclusive (a subject cannot be in more than one category) and the categories must cover all of the possible characteristics. For example, for the variable Eye Color you must have a category for every possible eye color and the categories must be separate, with no person being in more than one category. As you can imagine, it might not be possible to do this precisely; however, this is the goal that one must aim for when using variables like this for research. Many other qualitative variables are quite difficult to measure. For example, the variable Ethnicity is on many questionnaires and government forms. Even if you offer your subjects a choice of Caucasian, Hispanic, African, African-American, European, Middle Eastern, and East Asian, you will still have left out many ethnicities while at the same time, grouping many different cultures and ethnicities into the same category. In addition, if a person has a mother who is Hispanic and a father who is from Africa, it is unclear which category should be chosen. For the purpose of learning statistics, we will avoid complications like this and focus on variables that are much easier to measure and define. The main objective in this section is for you to be able to identify the level of measurement of a variable, because this will determine how you should graph your variable and which statistical tools can be applied to your variable (Chapters 3 and 4).