Lecture Start

Similar documents
Lecture Start

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Vocabulary. Bias. Blinding. Block. Cluster sample

Sampling for Success. Dr. Jim Mirabella President, Mirabella Research Services, Inc. Professor of Research & Statistics

BIAS: The design of a statistical study shows bias if it systematically favors certain outcomes.

Chapter 5: Producing Data

Chapter 3. Producing Data

AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Unit 1 Exploring and Understanding Data

Unit 3: Collecting Data. Observational Study Experimental Study Sampling Bias Types of Sampling

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Handout 16: Opinion Polls, Sampling, and Margin of Error

CHAPTER 5: PRODUCING DATA

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Psychology Research Process

MATH-134. Experimental Design

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Data = collections of observations, measurements, gender, survey responses etc. Sample = collection of some members (a subset) of the population

If you could interview anyone in the world, who. Polling. Do you think that grades in your school are inflated? would it be?

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Business Statistics Probability

Sampling. (James Madison University) January 9, / 13

Higher Psychology RESEARCH REVISION

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

Creative Commons Attribution-NonCommercial-Share Alike License

Psychology Research Process

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

REVIEW FOR THE PREVIOUS LECTURE

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Math 140 Introductory Statistics

Ch. 1 Collecting and Displaying Data

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Statistical Sampling: An Overview for Criminal Justice Researchers April 28, 2016

CHAPTER 3 METHOD AND PROCEDURE

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

Chapter 1 Data Collection

Chapter 1: Exploring Data

Math 124: Module 3 and Module 4

Methodological skills

Sta 309 (Statistics And Probability for Engineers)

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Sampling Controlled experiments Summary. Study design. Patrick Breheny. January 22. Patrick Breheny Introduction to Biostatistics (BIOS 4120) 1/34

Math 124: Modules 3 and 4. Sampling. Designing. Studies. Studies. Experimental Studies Surveys. Math 124: Modules 3 and 4. Sampling.

BIOSTATISTICS. Dr. Hamza Aduraidi

Chapter 2 Survey Research Design and Quantitative Methods of Analysis for Cross-Sectional Data

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

P. 266 #9, 11. p. 289 # 4, 6 11, 14, 17

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Study Design. Study design. Patrick Breheny. January 23. Patrick Breheny Introduction to Biostatistics (171:161) 1/34

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Chapter 3. Producing Data

Survey Research Methodology

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

STA Module 9 Confidence Intervals for One Population Mean

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

STATISTICS: METHOD TO GET INSIGHT INTO VARIATION IN A POPULATIONS If every unit in the population had the same value,say

RESEARCH METHODOLOGY-NET/JRF EXAMINATION DECEMBER 2013 prepared by Lakshmanan.MP, Asst Professor, Govt College Chittur

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

DOING SOCIOLOGICAL RESEARCH C H A P T E R 3

Chapter 11. Experimental Design: One-Way Independent Samples Design

t-test for r Copyright 2000 Tom Malloy. All rights reserved

Abdul Latif Jameel Poverty Action Lab Executive Training: Evaluating Social Programs Spring 2009

Statistical Power Sampling Design and sample Size Determination

Chapter-2 RESEARCH DESIGN

THIS CHAPTER COVERS: The importance of sampling. Populations, sampling frames, and samples. Qualities of a good sample.

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Moore, IPS 6e Chapter 03

Chapter 2. Behavioral Variability and Research

Introduction: Statistics, Data and Statistical Thinking Part II

Student Performance Q&A:

Module 28 - Estimating a Population Mean (1 of 3)

Measurement and meaningfulness in Decision Modeling

Chapter 5: Producing Data Review Sheet

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Political Science 15, Winter 2014 Final Review

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Sampling and Data Collection

TOPIC: Introduction to Statistics WELCOME TO MY CLASS!

Statistical Methods Exam I Review

Review+Practice. May 30, 2012

Patrick Breheny. January 28

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Sampling Reminders about content and communications:

Transcription:

Lecture -- 5 -- Start

Outline 1. Science, Method & Measurement 2. On Building An Index 3. Correlation & Causality 4. Probability & Statistics 5. Samples & Surveys 6. Experimental & Quasi-experimental Designs 7. Conceptual Models 8. Quantitative Models 9. Complexity & Chaos 10. Recapitulation - Envoi

Outline 1. Science, Method & Measurement 2. On Building An Index 3. Correlation & Causality 4. Probability & Statistics 5. Samples & Surveys 6. Experimental & Quasi-experimental Designs 7. Conceptual Models 8. Quantitative Models 9. Complexity & Chaos 10. Recapitulation - Envoi

Quantitative Techniques for Social Science Research Lecture # 5: Samples And Surveys Ismail Serageldin Alexandria 2012

Sample Surveys are among the most studied and written about topics in statistics

So: no Textbooks.. Just follow the presentation

Why Do Sample Surveys

Why do we do sample surveys?

We want to know something about the Population so we study a small sample of the Population (making sure that the sample is representative) Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

So we will discuss how to undertake sampling and how to do surveys

Let s start with some definitions

Data, Variables, Statistics and Parameters

Variables A variable is an attribute that describes a person, place, thing, or idea. The value of the variable can "vary" from one entity to another. Qualitative Variables are categorical: e.g. The color of balls are green, red or blue. Quantitative Variables are numeric: e.g. the population of a city. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Quantitative Variables: Continuous and Discrete Continuous variables can take any value between the maximum/minimum range: e.g. the weight of the persons in a class. Discrete variables must have an integer value: e.g tossing a coin, how many times do we get heads? It can never be 2.7 times, it will have to be 1,2,3, n Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

TEST Which of the following statements are true? I. All variables can be classified as quantitative or categorical variables. II. Categorical variables can be continuous variables. III. Quantitative variables can be discrete variables. Answer: I and III are correct Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

TEST Which of the following statements are true? I. All variables can be classified as quantitative or categorical variables. II. Categorical variables can be continuous variables. III. Quantitative variables can be discrete variables. Answer: I and III are correct Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Two Snapshots, Two states : Discrete variables imply sudden moves from state to state Continuous variables imply constantly changing transitions between two snapshots

Transitions can be cut up in discrete states

But many transitions are really continuous

Example: Students leaving school and entering the Labor Market

Later we will discuss how this fits in Markov chains and the manpower model

But let s go back to the issues of Data Collection

Methods Of Data Collection There are four main methods of data collection. Census. A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required. Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Methods of Data Collection (Cont d) Experiment. An experiment is a controlled study in which the researcher attempts to understand cause-and-effect relationships. Observational study. The researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives. (Case Studies are observations of one case.) Note: Observational Studies do NOT allow you to generalize the findings. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Why do Sample Surveys? The reason for conducting a sample survey is to estimate the value of some attribute of a population. It is much cheaper and easier than doing a whole census When done scientifically, we can define the error term accurately (e.g. ±3%) Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Pros and Cons Resources. A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census. Generalizability. Applying findings from a study to a larger population. Generalizability requires random selection. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Pros and Cons (continued) Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

We will have a lot more to say on Experimental Designs later.

We must distinguish between the sample statistic and the population parameter

From Population To Sample To Population: (From Sample Statistic To Population Parameter) Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Population Parameter vs. Sample Statistic Population parameter. A population parameter is the true value of a population attribute. Sample statistic. A sample statistic is an estimate, based on sample data, of a population parameter. The estimate comes with the error term (e.g. ±3%) Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Example Of Population Parameter vs. Sample Statistic Example. We want to know the percentage of voters that favor a new tax. The actual percentage of all the voters is a population parameter. The estimate of that percentage, based on sample data, is a sample statistic. The quality of a sample statistic (i.e., accuracy, precision, representativeness) is strongly affected by the way that sample observations are chosen; that is, by the sampling method. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Bad Surveys make for bad estimates

Estimates of the front runners in the Egyptian Presidential Election 2012 Before the first Round: After the first Round: 1. Abdel Moneim Aboulfotouh 2. Amr Moussa 3. Mohamed Morsi 4. Hamdein Sabahi 5. Ahmed Shafik 1. Mohamed Morsi 2. Ahmed Shafik 3. Hamdein Sabahi 4. Abdel Moneim Aboulfotouh 5. Amr Moussa

The US 1948 Presidential Election: Truman vs. Dewey

Bad (Inaccurate) Polls

What does it mean to say: the poll says 52% (±3%) at 95% confidence level? The 52% is the finding from the sample survey The Error term (±3%) is related to the Sampling error: it means that we think the real value is between 49% and 55% The 95 % confidence level means that there are 95 chances in 100 that these values are correct; i.e. that the real figures in the population will fall in that range. The error term will vary according to the size of sample.

What is sampling error? (The margin of error, or the ± 3%) Sampling Error is the calculated statistical imprecision due to interviewing a random sample instead of the entire population. The margin of error provides an estimate of how much the results of the sample may differ due to chance when compared to what would have been found if the entire population was interviewed. The confidence level (95 % or 95 out of 100) says that we are that confident in that result within that ± error term.

Sampling error Sampling error is related to sample size, but it is not the only kind of error possible in a sample surveys. You can look it up in sampling error tables such as the one I can show you here This table is produced by Gallup for a sample from a target population of 200 million, with a confidence level of 95%

Recommended allowance for sampling error of a percentage * In Percentage Points (at 95 in 100 confidence level)** SAMPLE SIZE 1,000 750 500 250 100 Percentage near 10 2% 2% 3% 4% 6% Percentage near 20 3 3 4 5 9 Percentage near 30 3 4 4 6 10 Percentage near 40 3 4 5 7 10 Percentage near 50 3 4 5 7 11 Percentage near 60 3 4 5 7 10 Percentage near 70 3 4 4 6 10 Percentage near 80 3 3 4 5 9 Percentage near 90 2 2 3 4 6 Table extracted from 'The Gallup Poll Monthly'. Cited at http://www.ropercenter.uconn.edu/education/polling_fundamentals_error.html

An Important Observation: Statistical Error and sample size As the sample size increases, there are diminishing returns in percentage error. At percentages near 50%, the statistical error drops from 7 to 5% as the sample size is increased from 250 to 500. But, if the sample size is increased from 750 to 1,000, the statistical error drops from 4 to 3%. As the sample size rises above 1,000, the decrease in marginal returns is even more noticeable.

Among others, Langer Research Associates offers a margin-of-error calculator -- MoE Machine -- as a convenient tool for data producers and everyday data users. Access the MoE Machine at http://langerresearch.com/moe.php.

So, let s learn more about surveys and sampling

Types of Samples

What is a Survey? A survey may refer to many different types or techniques of observation, but it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Since we do not do a coverage of all the population we select a sample. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection.

What is Survey Sampling? In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census.

Sampling

Two Kinds of Survey Samples Non-Probability samples and Probability samples

Sampling Methods Non-probability samples. We do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen. Probability samples. Each population element has a known (non-zero) chance of being chosen for the sample. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Non-Probability Sampling

Pros & cons of Non-Probability Sampling Advantages: convenience and cost. Disadvantage: We cannot estimate the extent to which sample statistics are likely to differ from population parameters. Only probability sampling methods permit that kind of analysis. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Two of the main types of non-probability sampling methods Voluntary sample. People who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey. E.g. those who call in to talk show, or participate in an on-line poll. This would be a volunteer sample. Convenience sample. A convenience sample is made up of people who are easy to reach. E.g. interviewing my students or my employees or shoppers at a local mall, If the group or the location was chosen because it was a convenient this would be a convenience sample. Note: Neither allows generalization to the population. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Non-probability Sample Surveys Surveys that are not based on probability sampling have no way of measuring their bias or sampling error. Surveys based on non-probability samples are not externally valid. You cannot generalize from them to the general population. They can only be said to be representative of the people that have actually completed the survey.

Non-Probability Samples The relationship between the target population and the survey sample is immeasurable and potential bias is unknowable. Sophisticated users of non-probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement Analysts examine the results for internally consistent relationships.

Examples Of Non-Probability Samples Judgment Samples: A researcher decides which population members to include in the sample based on his or her judgment. The researcher may provide some alternative justification for the representativeness of the sample. Snowball Samples: Often used when a target population is rare, members of the target population recruit other members of the population for the survey.

Examples Of Non-Probability Samples Quota Samples: The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffee drinkers. This type of sampling is common in non-probability market research surveys. Convenience Samples: The sample is composed of whatever persons can be most easily accessed to fill out the survey.

Probability Sampling

Probability samples are the only ones whose results will be generalizable to the entire population

Random Samples

Ronald Fisher (1890-1962)

Extract from table of random numbers

Main types of probability sampling Simple random sampling, Stratified sampling, Cluster sampling, Multistage sampling, and Systematic random sampling. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Probability Samples are representative The key benefit of all these probability sampling methods is that they guarantee that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid. Hence the conclusions are generalizable Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Simple Random sampling The population consists of N objects. The sample consists of n objects. If all possible samples of n objects are equally likely to occur, the sampling method is called simple random sampling. Selection is done by a lottery method or using a table of random number or a computerized random number generator. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Stratified Sampling Stratified sampling. The population is divided into groups, based on some characteristic. The groups are called strata. Then, within each group, a probability sample (often a simple random sample) is selected. As a example, suppose we conduct a national survey. We might divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Cluster sampling Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster. A sample of clusters is chosen, using a probability method (often simple random sampling). Only individuals within sampled clusters are surveyed. E.g. select a sample of BA units, survey all the staff in these units. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Multistage sampling. Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods. For example, in Stage 1, we might use cluster sampling to choose clusters from a population. Then, in Stage 2, we might use simple random sampling to select a subset of elements from each chosen cluster for the final sample. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Systematic random sampling. Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Thereafter, we select every kth element on the list. This method is different from simple random sampling since every possible sample of n elements is not equally likely. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

How To Select A Probability Sample

How to select a probability sample

Probability Sampling A probability-based survey sample is created by constructing a list of the target population, called the sample frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode.

Probability Sampling: Step 1 Construct a Sample frame: A probability-based survey sample is created by constructing a list of the target population, called the sample frame. For some target populations this process may be easy, for example, sampling the employees of a company by using payroll list. However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.

Probability Sampling: Step 2 Selecting a sample from within the Sample frame: a randomized process for selecting units from the sample frame, called a selection procedure. Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently Address-Based Sampling.

Specialized Techniques Of Probability Sampling Within probability sampling there are specialized techniques such as: stratified sampling & cluster sampling These techniques improve the precision or efficiency of the sampling process without altering the fundamental principles of probability sampling.

Probability Sampling: Step 3 Collecting the Data: There must be a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode.

Sources Of Bias

Major Types of Bias In Surveys Non-response bias Coverage bias Selection bias

Major Types of Bias In Surveys Non-response bias Coverage bias Selection bias

Major Types of Bias In Surveys Non-response bias: When individuals or households selected in the survey sample cannot or will not complete the survey there is the potential for bias to result from this non-response. Non-response bias occurs when the observed value deviates from the population parameter due to differences between respondents and non-respondents.

Major Types of Bias In Surveys Non-response bias Coverage bias Selection bias

Major Types of Bias In Surveys Coverage bias: Coverage bias can occur when population members do not appear in the sample frame (undercoverage). Coverage bias occurs when the observed value deviates from the population parameter due to differences between covered and noncovered units. Telephone surveys suffer from a well known source of coverage bias because they cannot include households without telephones.

Major Types of Bias In Surveys Non-response bias Coverage bias Selection bias

Major Types of Bias In Surveys Selection Bias: Selection bias occurs when some units have a differing probability of selection that is unaccounted for by the researcher. For example, some households have multiple phone numbers making them more likely to be selected in a telephone survey than households with only one phone number. This selection bias would be corrected by applying a survey weight equal to [1/(# of phone numbers)] to each household.

But how you select your sample is only one of the issues in doing survey research

Bias Due to Measurement Error In survey research, the measurement process includes the environment in which the survey is conducted, the way that questions are asked, and the state of the survey respondent. Response bias refers to the bias that results from problems in the measurement process. Some examples of response bias: Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Examples of Response Bias (Due to error in the Measurement process) Leading questions. The wording of the question may be loaded in some way to unduly favor one response over another. For example, a satisfaction survey may ask the respondent to indicate where she is satisfied, dissatisfied, or very dissatisfied. By giving the respondent one response option to express satisfaction and two response options to express dissatisfaction, this survey question is biased toward getting a dissatisfied response. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Examples of Response Bias Cont d (Due to error in the Measurement process) Social desirability. Most people like to present themselves in a favorable light, so they will be reluctant to admit to unsavory attitudes or illegal activities in a survey, particularly if survey results are not confidential. Instead, their responses may be biased toward what they believe is socially desirable. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Sampling Statistic and Sampling Error A survey produces a sample statistic, which is used to estimate a population parameter. If you repeated a survey many times, using different samples each time, you might get a different sample statistic with each replication. And each of the different sample statistics would be an estimate for the same population parameter. If the statistic is unbiased, the average of all the statistics from all possible samples will equal the true population parameter; even though any individual statistic may differ from the population parameter. The variability among statistics from different samples is called sampling error. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

Increasing The Sample size: Reduces Sampling Error but NOT Survey Bias Increasing the sample size tends to reduce the sampling error; that is, it makes the sample statistic less variable. However, increasing sample size does not affect survey bias. A large sample size cannot correct for the methodological problems (undercoverage, nonresponse bias, etc.) that produce survey bias. Example: The Literary Digest Survey sample size was very large - over 2 million surveys were completed; but the large sample size could not overcome problems with the sample - undercoverage and nonresponse bias. Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

The Null Hypothesis & Types of Error

To analyze survey data and arrive at a conclusion, we need to formulate a Null Hypothesis

Null Hypothesis It is usually a statement that can be falsified and whose acceptance or rejection yields a useful insight into the problem being studied and for which the data was collected. The null hypothesis is a hypothesis which the researcher tries to disprove, reject or nullify. It is symbolized by H 0

The first to formalize the notion of the Null Hypothesis Ronald Fisher (1890-1962)

How do you state your basic (null) Hypothesis? Usually: the normal state (don t worry, no effect, no change) Or: there is no difference between expected and observed (i.e. difference is due to chance only)

How do you state your basic (null) Hypothesis? Usually: the normal state (don t worry, no effect, no change) Or: there is no difference between expected and observed (i.e. difference is due to chance only)

One-tailed or Two-tailed Tests One-Tailed : Accept H 0 Reject H 0 Two Tailed: Reject H 0 Accept H 0 Reject H 0

Usually: No directionality: use two-tailed test Directionality: use one-tailed test

The Null Hypothesis identifies which kind of test is needed: One tailed or two-tailed In classical science, it is most typically the H 0 statement that there is no effect of a particular treatment; in observations, it is typically that there is no difference between the value of a particular measured variable and that of a prediction, or between two means. We use a two-tailed test But when there is Directionality, i.e. when we say that it is better than, bigger than or less than, we use a One-Tailed Test.

BUT: In Accepting or rejecting the Null Hypothesis we could be making Two different types of error

Type I error: (False Positive) Test says: This person is healthy Reality: This person has cancer Test says: This person is not guilty Reality: This person is guilty Test Says: This product is faulty Reality: This product is good

Type II error: (False Negative) Test says: This person has cancer Reality: This person is healthy Test says: This person is guilty Reality: This person is not guilty Test Says: This product is good Reality: This product is faulty

Type I & Type II Error Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Two other kinds of error: In 1948, Frederick Mosteller (1916-2006) Type III error: "correctly rejecting the null hypothesis for the wrong reason". (1948, p.61)

Two other kinds of error: In 1970, Marascuilo and Levin proposed a "fourth kind of error" -- a "Type IV error" defined as being the mistake of "the incorrect interpretation of a correctly rejected hypothesis"; which, they suggested, was the equivalent of "a physician's correct diagnosis of an ailment followed by the prescription of a wrong medicine" (1970, p.398).

Other risks of error: This is in addition to many other risks: Correctly specifying the problem Sampling design Experimental or quasi-experimental designs Correctly understanding the kind of data and its limitations Correctly specifying the type of statistical analysis Correctly interpreting the results

Calculation & Conclusions

Conclusion of the statistical analysis is to accept/reject the Null Hypothesis

Type I & Type II Error Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Type I & Type II Errors Source: http://stattrek.com/statistics/data-collection-methods.aspx?tutorial=ap

More samples means more accurate estimation of the population parameter Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

How to refer to significance level of a test (all these statements are equivalent) You should be familiar with these expressions Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Tips to Help Avoid Common Mistakes Remember to convert between variance and standard deviation. Check if hypothesis is one- or two-tailed. For two-tailed, split α to. Always use n - 1 degrees of freedom for one sample t-test. Keep statistics (, s) distinct from population parameters (, α).

Choosing the significance level for a test Remember: the smaller the significance level p ( say 0.01 rather than 0.05), the more stringent the test. Choose the level based on: Sample size Estimated size of the effect being tested Consequences of making a mistake Common Significance levels:.05 (1 chance in 20);.01 (1 chance in a hundred) or.001 (1 chance in a thousand) Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001 688

Choosing the significance level for a test Remember: the smaller the significance level p ( say 0.01 rather than 0.05), the more stringent the test. Choose the level based on: Sample size Estimated size of the effect being tested Consequences of making a mistake Common Significance levels:.05 (1 chance in 20);.01 (1 chance in a hundred) or.001 (1 chance in a thousand) Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Common Mistakes Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Lets take a few simple examples of a calculation

Remember: the normal (Gaussian) distribution, the Bell Curve It has a mean, and a standard deviation.

The standard deviation defines how spread out the distribution is:

Remember: The sample statistic (measured) is only an estimate for the Population parameter (inferred) Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Common Statistical Notation Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001

Numerical Measures (Formulae) Mean: = = Variance: s 2 = = Standard Error of the Mean: = Median: the middle value of ordered values Nth percentile: the value such that N% of ordered values lie below it Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001 696

Assume that we have the mean of a distribution. We need to find the standard deviation (or its square: the variance)

The Variance is the square of the Standard Deviation

Calculating the Variance and the standard deviation The formula for calculating the variance: = The Standard deviationis given by: = 699

Example: calculating Variance and Standard Deviation For example, using these six measures 3,9,1,2,5 and 4: =3+9+1+2+5+4=24 =3 +9 +1 +2 +5 +4 =9+81+1+4+25+16=136 The quantities are the substituted into the shortcut formulate to find. = =136 24 Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001 6 700

Example: calculating Variance and Standard Deviation =!" #$" " =%& The variance and standard deviation are now found as before: = = %& # =' = = '=.'' Source: Statistics, Cliffs Quick Review, Wiley, NY, 2001 701

We will say more about the standard deviation and the variance in a moment

Understanding What Is Behind A Formula

Clear thinking about statistics: understanding what is behind the formula

I want you to understand. the logic behind a formula. You do not need to memorize any formula. You do that by asking questions. For example, let s look at the formula for computing the sample variance: ) = * * +,,- Let s ask why this? and why that? 705

Why do we square the deviations from the mean?. = 1 1 / 1 + 0 0-2 706

Why do we square the deviations from the mean?. = 1 1 / 1 + 0 0-2 Because, if we add up all deviations, we get always zero value. So, to deal with this problem, we square the deviations. Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data. 707

Why do we square the deviations from the mean?. = 1 1 / 1 + 0 0-2 Because, if we add up all deviations, we get always zero value. So, to deal with this problem, we square the deviations. Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data. 708

Why do we square the deviations from the mean?. = 1 1 / 1 + 0 0-2 Because, if we add up all deviations, we get always zero value. So, to deal with this problem, we square the deviations. Bonus: Notice that squaring also magnifies the deviations; therefore it helps us better feel the spread of the data. 709

Why not raise to the power of four (three will not work)?. = 1 1 / 1 + 0 0-2 710

Why not raise to the power of four (three will not work)?. = 1 1 / 1 + 0 0-2 Squaring does the trick; why should we make life more complicated than it is? 711

Why is there a summation notation in the formula?. = 1 1 / 1 + 0 0-2 712

Why is there a summation notation in the formula?. = 1 1 / 1 + 0 0-2 To add up the squared deviation of each data point to compute the total sum of squared deviations. 713

Why do we divide the sum of squares by n-1.. = 1 1 / 1 + 0 0-2 714

Why do we divide the sum of squares by n-1.. = 1 1 / 1 + 0 0-2 The amount of deviation should reflect also how large the sample is; so we must bring in the sample size. Why? Because, in general, larger sample sizes have larger sum of square deviation from the mean. 715

Why do we divide the sum of squares by n-1.. = 1 1 / 1 + 0 0-2 The amount of deviation should reflect also how large the sample is; so we must bring in the sample size. Why? Because, in general, larger sample sizes have larger sum of square deviation from the mean. 716

Why divide by n-1 not n?. = 1 1 / 1 + 0 0-2 717

Why divide by n-1 not n?. = 1 1 / 1 + 0 0-2 When you divide by n-1, the sample's variance provides an estimated variance much closer to the population variance, than when you divide by n. But for larger samples, (say over 30), it really does not matter whether it is divided by n or n-1. The results are almost the same, and they are acceptable. 718

Why divide by n-1 not n?. = 1 1 / 1 + 0 0-2 When you divide by n-1, the sample's variance provides an estimated variance much closer to the population variance, than when you divide by n. But for larger samples, (say over 30), it really does not matter whether it is divided by n or n-1. The results are almost the same, and they are acceptable. 719

Does N-1 have a Meaning?. = 1 1 / 1 + 0 0-2 720

Does N-1 have a Meaning?. = 1 1 / 1 + 0 0-2 The factor n-1 is what we consider as the "degrees of freedom" (but that is another discussion). Degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. 721

Does N-1 have a Meaning?. = 1 1 / 1 + 0 0-2 The factor n-1 is what we consider as the "degrees of freedom" (but that is another discussion). Degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. 722

Explain number of values that are allowed to vary. = 1 1 / 1 + 0 0-2 723

Explain number of values that are allowed to vary. = 1 1 / 1 + 0 0-2 For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean. 724

Explain number of values that are allowed to vary. = 1 1 / 1 + 0 0-2 For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean. 725

Degrees of Freedom The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). So for calculating the mean of the sample, we have all the observations in the sample size (n). But to calculate the distance from the mean, you have one less. Why? If you have two observations, they will be both at the same distance from the mean.

This example shows how to question statistical formulas. To help you understand them rather than memorizing them. Then you can use the concepts better.

Clear thinking is always more important than the ability to calculate something.

Clear Thinking

Social surveys Framing the Issues Identifying the target population Sample Frame and Sample design Instrument design Gathering data Analyzing data Interpreting Results

That is done within the framework of a research design

Applications Market research Opinion poll Voting expectations Educational or Health studies Sociological studies Medical clinical studies And so much more

Examples of US/UK Major surveys National Election Studies Gallup poll General Social Survey International Social Survey United Kingdom Census United States Census National Health and Nutrition Examination Survey World Values Survey

Again: Clear thinking is always more important than the ability to calculate something.

So, One More Time

With Clear thinking you will not be a turkey

You will learn to fly

Some will even soar like an eagle

Thank You