Knowledge is Power: The Basics of SAS Proc Power

Size: px

Start display at page:

Download "Knowledge is Power: The Basics of SAS Proc Power"

Tobias Wright
5 years ago
Views:

1 ABSTRACT Knowledge is Power: The Basics of SAS Proc Power Elaina Gates, California Polytechnic State University, San Luis Obispo There are many statistics applications where it is important to understand how the power function of a specific distribution behaves. Coding a power function from scratch can be an arduous process and can become complicated when investigating effect size and sample size. This presentation will cover the basic uses of proc power in regards to testing proportions and include examples of power analyses. It will also demonstrate the simplicity of using proc power to generate plots of power curves and obtain other valuable information. INTRODUCTION In hypothesis testing, there is a null and alternate hypothesis. When something is found to be statistically significant, we reject the null in favor of the alternative. Power in regards to hypothesis testing is defined as the probability of correctly rejecting the null hypothesis. In a more general definition, power is defined as the probability of rejecting the null hypothesis (without any assumptions). Many factors play a part in calculating power, one of which is sample size. This is perhaps the most common use of studying power curves prior to a study. In order to save time and money in statistical studies, researchers use power analysis to determine what their optimal sample size should be in order to show statistical significance. This paper will outline how to use proc power, specifically with proportions, to determine a suitable sample size, how to calculate power after a sample size is chosen, and how to interpret the plot of the power curve. DETERMINING SAMPLE SIZE Power analysis is useful in determining the number of subjects needed in a study or a clinical trial. One of these applications may be deciding how many subjects are needed in a control group versus a treatment group to achieve a specific level of power. For example, a new drug is being developed to treat migraine headaches. The current treatment reduces symptoms in 40% of patients; this new drug will be put into production only if its effectiveness is at least 15% higher than the current treatment. For this experiment, we need two groups of patients. One will be given the current drug and the other group will be given the new drug. The results of the groups will be compared to determine the effectiveness of the drug and if it is at least 15% more effective. Before conducting this experiment, the researchers need to know how many subjects will be needed in each group to achieve power of at least 0.8. Using proc power, we will conduct a power analysis for this experiment. The code is shown in below. We will use the twosamplefreq option. The test we will be using to compare the two groups is a Pearson Chi-Square test and this is specified in the test= option. The default of proc power is a two-sided test. In this study we will change it to a one sided test because we are interested in the improvement in symptoms. Finally, we include what level of power we want to achieve after power= and include ntotal=. so SAS calculates the sample size minimum. power = 0.8 ntotal=.; 1

Figure 1. Proc Power Results for Migraine Example Part I After running the power analysis, the output shows us that in order to achieve a level of power of at least 0.

2 Figure 1. Proc Power Results for Migraine Example Part I After running the power analysis, the output shows us that in order to achieve a level of power of at least 0.8 we must have a sample size of 272 subjects. Since we haven t changed any options regarding the weight of the two groups, the default is equal sample sizes. This can be altered with the groupweights= option shown below. This will give us the sample size minimum for two groups where one has twice as many subjects as the other. groupweights = (1 2) power = 0.8 ntotal=.; DETERMINING POWER FOR A GIVEN SAMPLE SIZE When conducting an experiment where the sample size has already been selected, you can use proc power to calculate the power as well as provide plots showing how the sample size will affect the power. Suppose that there are only 160 subjects who are qualified to participate in the study involving the migraine treatment described above. The researchers are curious as to how powerful the test will be with this sample size. Now we switch the code to include the total number of subjects and change power= to missing. ntotal= 160 power =.; 2

3 Figure 2. Proc Power Results for Migraine Example Part II Effect size can also have an impact on the power of a test. It is tougher to detect a small difference between the null and alternative hypothesis than a larger difference. In the migraine example, the effect size is relatively small. Using the plot statement with proc power, we can look at how different effect sizes and sample sizes change the power of this test in Figure 3. Now instead of only including 0.4 and 0.55, representing the proportion of subjects who s symptoms are improved with the current and new medication respectively, we can include many pairs. We have included a smaller effect size, represented by the pair 0.4 and 0.5, as well as two larger effect sizes. Now the output will calculate the power for all of these pairs of proportions. After the twosamplefreq options, we have included a plot option. This will generate a plot in the output. By including x=n, we are plotting sample size on the x-axis. I have also twosamplefreq test = pchi groupproportions = (0.4, 0.5)(0.4, 0.55) (0.4, 0.6) (0.4, 0.65) power =. ntotal= 300; plot x = n min = 100 max = 500; included the limits of the x-axis after min= and max=. 3

Figure 3. Plot and Results Generated by Proc Power The plot in Figure 3 illustrates the effects of increasing sample size and increasing effect size.

4 Figure 3. Plot and Results Generated by Proc Power The plot in Figure 3 illustrates the effects of increasing sample size and increasing effect size. We can see that as the sample size increases, so does the power. The power also increases as the effect size gets larger. This application of proc power saves a lot of time. If we were to code this plot from scratch we would need many loops and iterations. This output also calculates the power for each of the effect sizes. CONCLUSION The POWER procedure saves the user time coding and provides all of the relevant output and plots needed for a power analysis. As a student, I have found proc power to be extremely beneficial because there are many assignments in which I have needed to conduct a power analysis. It is also beneficial for SAS users who design experiments. The plot option is extremely flexible and more user-friendly than coding from scratch. Some of the other options for tests are two-sample tests involving means as well as one sample tests for both means and proportions. This procedure also has options for tests involving linear regression, survival, ANOVA, and logistic regression. Proc power is a great tool for calculating power, sample sizes, and creating plots. REFERENCES SAS Institute Inc., SAS 9.2 User Guide. The Power Procedure, Cary, NC: SAS Institute Inc., 2016 SAS Data Analysis Examples. UCLA. Statistical Consulting Group. from ACKNOWLEDGMENTS I would like to acknowledge Professor Matthew Carlton for his phenomenal instruction on power curves and hypothesis testing. I would like to thank Rebecca Ottesen for her advice on my presentation as well as her neverending help and instruction in my SAS endeavors. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Elaina Gates elainagates@gmail.com Web: 4

5 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 5

Statistical questions for statistical methods

Statistical questions for statistical methods Unpaired (two-sample) t-test DECIDE: Does the numerical outcome have a relationship with the categorical explanatory variable? Is the mean of the outcome the