Methods for Determining Random Sample Size

Methods for Determining Random Sample Size This document discusses how to determine your random sample size based on the overall purpose of your research project. Methods for determining the random sample size are outlined. Prepared by: UW-Stout Office of Planning, Assessment, Research and Quality Contact: Susan Greene Revised: 8/13/2012 3/20/2017 OFFICE OF PLANNING, ASSESSMENT, RESEARCH AND QUALITY Inspiring Innovation. Learn more at www.uwstout.edu 1

RANDOM SAMPLE DECISION TREE Random Sample PURPOSE: Sample Generalized to Population PURPOSE: Sample Compared to Population Confidence Intervals Need to Know: Population Alpha Type of Data Margin of Error Estimate of Variance Power Analysis Need to Know: Statistical Testing needed Alpha Power (1-beta) Estimate for Variance, Absolute Effect Size, Balanced or Unbalanced Sub-Group Computations Based on Data Type N Size Specific Estimates Possible To Do By Hand Computations Based on Data Type Need to Use Application Developed by Statistician Type of Data Categorical Nominal Data 2 or more categories not ordered Can assign numbers but the value is meaningless EX. (yes/no) (male/ female) Continuous Evenly spaced categories or is a continuous number Distance between categories is the same 2

DEFINITIONS Population vs. sample Project population is the group of individuals you want to generalize your results to. These are the people you are interested in describing, comparing, predicting. The project sample is a part of the population you select to produce the results. Typically, the population is everyone of interest, and the sample is a sub-set of the population. Confidence interval Null hypothesis Alpha Beta Power Categorical data Continuous data Margin of error Variance Effect size The range in a sample distribution between which it is expected that the true population value will lie, given the particular degree of confidence (typically 95% or 99%). Project research question stated as a hypothesis such that it is assumed that there is no effect or no difference between comparison groups. Statistical analysis tests whether the null hypothesis can be rejected or not. Often symbolized as H0. Probability that you reject the null hypothesis when it is true --this a false positive. Typically, alpha is set by the researcher prior to any statistical testing; common settings are 0.05 and 0.01. Often symbolized as α. Probability that you will accept the null hypothesis when it is false this is a false negative. Often symbolized as β. Probability that you reject the null hypothesis when it is false -- that you are able to detect a true effect. Often symbolized as 1 - β. Also called nominal data. Data that has 2 or more categories that are not ordered. Can assign numbers but the absolute value have no practical meaning. For example yes/no responses, male/female. May have evenly spaced categories or be a continuous number. The absolute distance between categories is the same so can answer the question of how much difference there is between categories. Tells us about the error due to sampling -- how well our sample represents the population. Spread of scores/responses around the average. Difference between average observed and expected effects; observed average difference between 2 groups. 3

COMPUTATIONS Notes: 1. For surveys or other archival data with more than one type of data, Cochran 1 suggests that the researcher decides which type of data contains the most critical information for the success of the project, and base the sample size on that data type. The researcher could also calculate sample sizes for each type of data and then use the most reasonable number based on available resources. 2. The results of the chosen estimation method will be for minimum random sample sizes only. For surveys and longitudinal studies, the researcher will need to increase the sample size due to non-response and drop-outs. The exact amount of adjustment will depend on the particular circumstances of the study. It is best to consult with resident experts to determine the adjustment factor for a specific project. 3. Sample size selection is also dependent on the precision of the measurement tool. Purpose: Sample Generalized to Population When your project results are meant to generalize from the sample to the broader population, the next section outlines methods to select your sample size. Examples of sample generalized to population: You are sending a survey to a random sample of UW-Stout students that contains a series of yes/no questions. You want to collect enough responses to reasonably generalize the results of your random sample to the entire UW-Stout student body. Follow the Confidence Interval Method -- Categorical Data methodology. Your survey contains rating scale questions for example, Likert-type scale where 1=strongly, disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree. You want to collect enough responses from your random sample of UW-Stout students to be confident in saying that the average ratings represent the opinion of all current Stout students. Follow the Confidence Interval Method Continuous Data method. Note: if you don t agree that these types of survey questions yield continuous data, please use the Confidence Interval Method -- Categorical Data methodology. 1 Cochran, W. G. (1977). Sampling Techniques (3 rd edition). New York: John Wiley & Sons. 4

Confidence Interval Method Categorical Data: Data needed prior to calculations: Specify population size Specify alpha and margin of error, typically set at 0.05 and 5% respectively. Specify variance estimate. For a dichotomous variable use ½ or 0.50 as the estimate of the population proportion unless you have evidence otherwise. There are two options for calculating sample size for categorical data using an online tool, or doing this by hand. 1. Online tool at http://www.raosoft.com/samplesize.html 2. Hand calculations using the Cochran method outlined in Bartlett, Kotrlik, and Higgins (2001) 2 : n 0 = t2 p (1 p) d 2 Equation 1 n 0 is the minimum estimated sample size t is the value of the t-distribution corresponding to the chosen alpha level for.05 this is 1.96 p is the estimate of population proportion* d is the margin of error Bartlett et al recommend using 5% *When p is unknown, generally it is best to set it at.5 3. If the estimate n 0 is greater than 5% of the overall population, make the following correction: n 0 n 1 = 1 + n 0 Equation 2 Population n 1 is the adjusted minimum estimated sample size Population is the total population size 2 Organizational Research: Determining Appropriate Sample Size in Survey Research accessible online at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.486.8295&rep=rep1&type=pdf 5

Continuous Data: Hand computation using the method developed by Cochran and outlined in Bartlett et al. The steps are: 1. Specify population size 2. Specify alpha and margin of error, typically set at 0.05 and 3% respectively. a. For rating scale questions, the margin of error would be 0.03 * # of scale points, so for a 5 point scale the margin of error would be 0.15 3. Specify variance estimate. There are 3 methods suggested by both Bartlett et al (2001) and Lenth 3 (2001) for doing this a. Do a pilot study to estimate variance b. Finding variance estimates from published literature of similar studies c. Using researcher s experience i. For survey s, Bartlett et al suggest using the following estimate for the standard deviation: S = number of points on scale number of standard deviations Equation 3 S is the estimate of the standard deviation The typical number of standard deviations used for a distribution is 6 this covers 99% of the data in the normal distribution For a 5 point scale, this would be 5/6 or 0.83 ii. Lenth suggests constructing a histogram or other diagram of the distribution of how you think the data should turn out and estimate variance based on this. 4. Calculate minimum sample size: n 0 = t2 S 2 d 2 Equation 4 n 0 is the minimum estimated sample size t is the value of the t-distribution corresponding to the chosen alpha level for.05 this is 1.96 S is the estimate of standard deviation d is the margin of error 5. If the estimate n 0 is greater than 5% of the overall population, make the following correction: n 0 n 1 = Equation 5 1 + n 0 Population n 1 is the adjusted minimum estimated sample size Population is the total population size 3 Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193. 6

Purpose: Sample Compared to Population When your project results are meant to compare the sample to the broader population, the next section outlines methods to select your sample size. Examples of sample compared to population: You are surveying a random sample of UW-Stout students to determine their satisfaction with different aspects of campus life. o Your survey contains rating scale questions for example, Likert-type scale where 1=strongly, disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree. o You have collected demographic data such as gender and year in school. o You want to collect enough responses from your random sample of UW-Stout students to be confident in saying that the differences in the average ratings by demographic group represent the differences in the opinions by demographic group of all current Stout students. For example Are there differences in the average ratings between males and females? Are there differences in the average ratings amongst the year in school groups? Power Method Option 1: Free online tool developed by Russell Lenth located at http://www.cs.uiowa.edu/~rlenth/power/#advice Option 2: Free software to download and run on your PC, information located at http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3. G-Power offers more options for selecting test type than the Lenth tool. You will need to have the following information prior to obtaining your sample size results Statistical test you are interested in running Alpha typically set at 0.05 Power (1-beta) typically set at 0.80 Variance estimate see discussion above Absolute effect size estimate Lenth (2001) advises 2 alternatives: 1. Based on the Principal Investigators knowledge of the project, determine the effect that you hope to see. This would establish an upper bound on the absolute effect size and a lower bound on the sample size. Then ask if an effect half that size would be important, noting that in most cases this would quadruple the sample size. This would help to establish a lower bound on the absolute effect size and an upper bound on the sample size. Keeping the power constant, you can use the different effect sizes to find a range of sample sizes, review these keeping in mind your purpose and resources, 7

and then select your final sample size. Or conversely, you can use different effect sizes and a given sample size and estimate the power, review these keeping in mind your purpose and resources, and then select your final sample size. 2. Examine published literature related to the study and see what the typical effect sizes are. Could you reasonably expect the same effect size? If so, use this as your base absolute effect size. Determine if you will have balanced or unbalanced sub-groups. For example, if you are making comparisons between men and women, will you have equal numbers in your response sample? 8