A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia

Similar documents
Introduction to Meta-Analysis

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Using SAS to Calculate Tests of Cliff s Delta. Kristine Y. Hogarty and Jeffrey D. Kromrey

Meta-Analysis: A Gentle Introduction to Research Synthesis

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Context of Best Subset Regression

Confidence Intervals On Subsets May Be Misleading

CHAPTER VI RESEARCH METHODOLOGY

How many speakers? How many tokens?:

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

ABSTRACT THE INDEPENDENT MEANS T-TEST AND ALTERNATIVES SESUG Paper PO-10

Unit 1 Exploring and Understanding Data

C2 Training: August 2010

1 The conceptual underpinnings of statistical power

Score Tests of Normality in Bivariate Probit Models

From Bivariate Through Multivariate Techniques

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

MEA DISCUSSION PAPERS

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Performance of the Trim and Fill Method in Adjusting for the Publication Bias in Meta-Analysis of Continuous Data

Business Statistics Probability

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

Still important ideas

Using The Scientific method in Psychology

Bayesian Tailored Testing and the Influence

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Behavioral Intervention Rating Rubric. Group Design

Instructions for doing two-sample t-test in Excel

Sheila Barron Statistics Outreach Center 2/8/2011

Section on Survey Research Methods JSM 2009

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Study Guide for the Final Exam

How to interpret results of metaanalysis

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Still important ideas

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

The Association Design and a Continuous Phenotype

Applied Statistical Analysis EDUC 6050 Week 4

Treatment Adaptive Biased Coin Randomization: Generating Randomization Sequences in SAS

Vote-Counting Method to Obtain an Estimate and Confidence Interval for the Population Correlation Coefficient Using SASI AFo

A Brief Introduction to Bayesian Statistics

To open a CMA file > Download and Save file Start CMA Open file from within CMA

The Effect Sizes r and d in Hypnosis Research

PTHP 7101 Research 1 Chapter Assignments

A Short Primer on Power Calculations for Meta-analysis

Running Head: ADVERSE IMPACT. Significance Tests and Confidence Intervals for the Adverse Impact Ratio. Scott B. Morris

Critical Thinking Assessment at MCC. How are we doing?

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

Audio: In this lecture we are going to address psychology as a science. Slide #2

Guidelines for reviewers

CHAPTER 3 METHOD AND PROCEDURE

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

The Meta on Meta-Analysis. Presented by Endia J. Lindo, Ph.D. University of North Texas

Chapter 12. The One- Sample

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Estimating Reliability in Primary Research. Michael T. Brannick. University of South Florida

Survival Skills for Researchers. Study Design

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

THE INTERPRETATION OF EFFECT SIZE IN PUBLISHED ARTICLES. Rink Hoekstra University of Groningen, The Netherlands

Estimation statistics should replace significance testing

Reliability. Internal Reliability

Generalization and Theory-Building in Software Engineering Research

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Throughout this book, we have emphasized the fact that psychological measurement

Power Analysis: A Crucial Step in any Social Science Study

The Lens Model and Linear Models of Judgment

III. WHAT ANSWERS DO YOU EXPECT?

Chapter 2. Behavioral Variability and Research

Detection Theory: Sensitivity and Response Bias

Behavioral Intervention Rating Rubric. Group Design

The Impact of Continuity Violation on ANOVA and Alternative Methods

Estimation of the predictive power of the model in mixed-effects meta-regression: A simulation study

DEVELOPING THE RESEARCH FRAMEWORK Dr. Noly M. Mascariñas

25. Two-way ANOVA. 25. Two-way ANOVA 371

For any unreported outcomes, umeta sets the outcome and its variance at 0 and 1E12, respectively.

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 3. Psychometric Properties

9 research designs likely for PSYC 2100

Tech Talk: Using the Lafayette ESS Report Generator

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

2 Critical thinking guidelines

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias

Comparison of the Null Distributions of

Using SAS to Conduct Pilot Studies: An Instructors Guide

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Significance Tests Based on Differences Donald W. Zimmerman, Carleton University

Small Sample Bayesian Factor Analysis. PhUSE 2014 Paper SP03 Dirk Heerwegh

Numerical Integration of Bivariate Gaussian Distribution

Bayes Factors for t tests and one way Analysis of Variance; in R

Causal Mediation Analysis with the CAUSALMED Procedure

Investigating the robustness of the nonparametric Levene test with more than two groups

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Dr. Kelly Bradley Final Exam Summer {2 points} Name

VIEW: An Assessment of Problem Solving Style

Transcription:

Paper 109 A SAS Macro to Investigate Statistical Power in Meta-analysis Jin Liu, Fan Pan University of South Carolina Columbia ABSTRACT Meta-analysis is a quantitative review method, which synthesizes the results of individual studies on the same topic. Cohen s d was selected as the effect size index in the current study because it is widely used in the practical met-analysis and there are few simulation studies investigating this index. Statistical power is conceptually defined as the probability of detecting the real-existing effect/difference. The current power analytical procedure of meta-analysis involves approximations and the accuracy of using the procedure is uncertain. Simulation can be used to calculate power in a more accurate way by addressing approximation in formula. The simulation studies involve generating data from computer programs to study the performance of the statistical estimates under different conditions (Hutchinson and Bandalo, 1997). If there is no real effect, researchers would hope to retain the null hypothesis. If there is a real existing effect, researchers would hope to reject the null hypothesis to increase statistical power. In each simulation step, a p- is retained to decide if the null hypothesis is rejected or retained. The proportion of the rejected null hypotheses of all simulation steps is the simulated statistical power when there is a non-zero treatment effect. The purpose of the study is to inform meta-analysis practitioners the degree of discrepancy between the analytical power and real simulated power in the meta-analysis framework (i.e., fixed and random effects models). SAS macro was developed to show researchers power under following conditions: simulated power in the fixed effects model, analytical power in the fixed effects model, simulated power in the random effects model, and analytical power in the random effects model. As long as researchers know the parameters that are needed in their meta-analysis, they can run the SAS macro to receive the power s they need in different conditions. Results indicate that the analytical power was close to the simulated power, while some conditions in the random-effects models had noticeable power discrepancies. This study will yield a better understanding of statistical power in real meta-analyses. INTRODUCTION Researchers have expressed their concerns over power in meta-analysis. Cafri, Kromrey, and Brannick (2010) asserted that power analysis is more important in meta-analysis because such studies summarize similar researches and influence more on theory and practice. Field (2001) investigated different meta-analytical models for correlation coefficient studies, and he examined the procedures that produced the most accurate and powerful results under different conditions. Cohen and Becker (2003) demonstrated how meta-analysis could increase statistical power. In their study, three indices were examined: standardized mean difference, pearson s r, and odds ratio. Statistical power could be increased by reducing the standard error of the weighted effect size. However, the number of studies would not always increase statistical power and the between-study variance should be considered under the random effects model. Stern, Gavaghan and Egger (2000) found that the power was limited in the meta-analyses, based on a small number of individual studies. In this case, results should be interpreted with great care. The results can be biased if researchers run analysis with insufficient statistical power (Ellis, 2010). Thus, statistical power in the meta-analysis has great implications for the study result, which is the main focus of current study. The current study seeks to extend the previous research to gauge the performance of statistical power in meta-analysis (two-group differences on continuous outcomes) under various conditions such as the number of studies, the sample sizes of individual studies, and the between-study variances. Although the analytical power formulas have been developed by researchers (Hedges & Pigott, 2001), the accuracy of the formulas are unknown. This paper introduced an alternative way to calculate statistical power using simulation. Implementation of power analysis among practical researchers can be impeded by the required technical expertise. Thus, a SAS macro that generate the analytical power and simulated power were developed to help applied researchers estimate accurate statistical power in meta-analysis studies. THEORETICAL FRAMEWORK Two kinds of conceptual models can be employed in meta-analysis. They are formulated, according to the property of the effect sizes in individual studies. A fixed-effects model treats the population effect sizes from individual studies as the same. In other words, there is a common population effect size across studies in the fixed-effects model. By contrast, a random-effects model treats the population effect sizes from individual studies as a random sample of all possible effect sizes with an underlying distribution (e.g., normal distribution). In the fixed-effects model, the only reason that the effect size varies can be influenced is the random error. In a random-effects model, the effect size can be influenced by random error and the effects of different studies. While discussing model selection, 1

Hedges and Vevea (1998) stated that fixed-effects models are designed to make inferences about a population exactly like the studies sampled, but the random-effects models are designed to draw inferences about a population that may be not exactly identical. The fixed-effects models were used frequently in practice, and the random-effects models have seen increased use over time (Cafri, Kromrey & Brannick, 2010). Until now, there are no absolute guidelines for model selection. The model selection does affect how the effect size indexes are combined in the meta-analysis. A key concept in meta-analysis is the effect size. Researchers decide to reject or retain the null hypothesis, based on the p- of a test statistics (statistical significance), while they use effect size to measure the magnitude of an effect, sometimes referred to as the practical significance of a test. As suggested by Cohen (1990, page 1310), the primary product of a research inquiry is one or more measures of effect size, not p s. Effect size is not only important in the primary studies but also critical in meta-analysis as scholars combine the effect size from studies to get an average estimate of the treatment effects across studies. Ellis (2010) included a good summary of different kinds of effect sizes. There are two major families of effect size: d (e.g., odds ratio, Cohen s d; differences between groups) and r (e.g., Pearson correlation, Cohen s f; measure of association). The current study focuses on two group differences on continuous outcomes. In meta-analysis the effect sizes are always combined to compute the average effect size and its variance, which form a statistic test (i.e., z-test). One can then use the test statistic to make a decision about retaining or rejecting the null hypothesis about the average effect size. Power functions have been developed by Hedges and Pigott (2001), the procedures are similar as power in a Z test. It is calculated under the assumption that there is a mean difference between the two groups. It is the probability of identifying the real-existing effect after combining the effect size in each study to get an average effect size and standard error estimates. The procedures involve approximations by assuming the equal sample variance across studies. The accuracy of the power results is uncertain. Simulation can be used to calculate power analysis. The simulation studies are widely used in empirical research. The simulation studies involve generating data from computer programs to study the performance of the statistical estimates under different conditions (Hutchinson and Bandalo, 1997). The idea of simulation uses the same logic of hypothesis testing. If there is no real effect, researchers would hope to retain the null hypothesis most of the time and control the Type I error. If there is a real existing effect, researchers would hope to reject the null hypothesis as much as possible. The simulation can be used to check the performance of Type I error and power by repeating the same statistical procedures many times under regular model assumptions or under different model assumptions. For instance, in a two sample T test, the effect size can be simulated a certain number of times (e.g., 1000) by assuming a certain mean (whether there is a population effect or not) and standard deviation. In addition, the sample size is supplied in each repetition (e.g., 100). This can be done with the help of the computer software (e.g., R). Repeating the same process many times can help researcher calculate the rejection rate of a test or simulated power. In each repetition, a p- is retained to make a statistical decision (reject or retain the null hypothesis). Finally, 1000 p-s are stored in the output. To verify the Type I error, roughly 5% of the p-s should be rejected (<.05) rather than there is no population effect. The same strategy applies to simulated statistical power or one minus the Type II error. The proportion of the rejected null hypotheses among all the simulated tests is the simulated statistical power when the simulated tests assume a non-zero treatment effect. The two-group mean difference on continuous outcomes is used as the effect size index because it is a widely used in practice. For instance, researchers would like to investigate the gender differences on certain continuous outcomes, such as academic achievement levels and behavioral problems. If researchers type Metaanalysis and gender difference from PsycINFO, more than 400 articles are included in the results. A certain amount of them were using the effect size index on two group difference on a continuous outcome. However, there are a few simulation studies on this index. Comparatively, more studies used effect size r (e.g., Field, 2001). Cohen s d is used as the effect size index of each study to investigate the mean differences between two groups. The formula to calculate Cohen s d (in the major textbooks on effect size) is: Cohen M1 M2 d =. s pool Where M1 and M2 are the sample means for two groups, and s pool is the pooled standard deviation of two groups. Practical Researchers do not need to know the tedious procedures of meta-analysis procedure if the SAS macro can be developed for them. Thus, the formulas were not included in the paper. Detail information can be found at Statistical Power Analysis for the Social and Behavioral Sciences: Basic and Advanced Techniques (Liu, 2013). CODE AND SAMPLE OUTPUT The current study is intended to simulate statistical power in meta-analyses and compare the results from the analytical power. The formulas used in macro were from related textbooks (e.g., Liu, 2013). The following input parameters were needed to run the macro: the average sample size of each study (N), the average sample size of each group in each study (N1 and N2), population effect size (PES), number of studies (I), number of simulations (sims), and Type I error rate (alpha, usually set up as 0.05). Researchers can input the s according to their research needs. Through running SAS macro under the assigned conditions, meta-analysis researchers can receive the appropriate statistical power. In the complete results, they should be able to see four power s (fixed-effects analytical power, fixed-effects simulated power, random-effects analytical power, and random-effects simulated power). According to 2

the discrepancies and the models they want to select, they can decide what power s they want to report in their research studies. Below is the fixed-effect Macro code with two power s of one selected condition. The Macro code of random effects model is similar except that we need to specify the between-study variance parameter, which is not needed in the fixed-effects model. Only the two random-effects power estimates under the same condition with defined between-study variance are shown in the output and the code is not provided in the paper. Interesting researchers can contact the authors for the random-effects model code. %LET N=60; %LET sims=10000; %LET alpha=0.05; %LET PES=0.1; %LET I=20; %LET N1=30; %LET N2=30; DATA fixeffect; DO sampleid = 1 TO &sims; DO i=1 TO &I; x=rand("t",&n-2); d0=x*sqrt(1/&n1+1/&n2); ES= d0 + &PES; Variancewithin = &N/(&N1*&N2)+ (ES*ES*0.5)/&N; Weight = 1/Variancewithin; WeightES=Weight*ES; Weightsq=Weight*Weight; weightessq=weight*es*es; OUTPUT; END; END; PROC SQL; CREATE TABLE sum AS SELECT sum(weight) AS SumWeight, sum(weightes) AS SumWd, sum(weightsq) AS SumWsquare, sum(weightessq) AS SumWdsquare FROM fixeffect GROUP BY sampleid; QUIT; DATA power; SET sum; WeightedD = SumWd/SumWeight; SEM = sqrt(1/sumweight); Zstat = WeightedD/SEM; p = 2*(1 - cdf('normal', Zstat, 0, 1)); IF p < &alpha THEN RejectH0=1; ELSE RejectH0=0; PROC SQL; CREATE TABLE P AS SELECT (sum(rejecth0)/&sims) AS FROM power; QUIT; PROC PRINT DATA=P NOOBS; TITLE 'Fixed-effects simulated power'; 3

FORMAT 5.4; DATA equation; vi= &N/(&N1*&N2) + 0.5*&PES*&PES/&N; lamda=sqrt(&i)*&pes/sqrt(vi); =1-probnorm(probit(1-&alpha/2)-lamda)+probnorm(probit(&alpha/2)-lamda); OUTPUT; DATA equation (KEEP=);SET equation; PROC PRINT DATA=equation NOOBS; TITLE 'Fixed-effects analytical power'; FORMAT 5.4; run; DATA equation (KEEP=power); SET equation; PROC PRINT DATA=equation; TITLE 'Random-effects analytical power'; The results indicate that the analytical power and simulated power are close to each other with minimal discrepancy. Power in the fixed-effects model is higher than the one in the random-effects model. Fixed-effects simulated power.3967 Fixed-effects analytical power.4095 Random-effects simulated power.3510 Random-effects analytical power Value.3459 Output 1. Output from the Macro code 4

CONCLUSION This paper provides researchers with a SAS macro that can calculate power of two sample mean difference in meta-analysis. This increases the convenience of performing power analysis. Researchers just need to collect the information that are needed for the input parameters to receive an accurate estimation of statistical power. We anticipate there may be gaps between simulated power and analytical power under certain conditions and simulated power is suggested. We plan to keep working on the macro to make it close to the practical conditions, such as considering the moderate effect. We hope that applied researchers who are new to power analysis in meta-analysis, will find the macro helpful in their analysis! REFERENCES Cafri, G., Kromrey, J. D., & Brannick, M.T. (2010). A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology, Multivariate Behavioral Research, 45(2), 239-270. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Cohen, J. (1990). Things I have learned (so far), American Psychologist, 45(12): 1304-1312. Cohen, L. D., & Becker, B. J. (2003). How meta-analysis increases statistical power. Psychological Methods, 8(3), 243 253. Ellis, P.D. (2010). The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. New York, NY : Cambridge University Press. Field. A.P. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6(2). 161 180. Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6, 203 217. Hedges, L. V., & Vevea, J. (1998). Fixed- and random-effects models in meta-analysis. Psychological Methods, 3, 486 504. Hutchinson, S.R. & Bandalos, D.L. (1997). A guide to Monte Carlo simulations for applied researchers. Journal of Vocational Education Research, 22(4), 233-245. Liu, X. (2013). Statistical Power Analysis for the Social and Behavioral Sciences: Basic and Advanced Techniques. New York, NY: Routledge. Sterne, J.A.,Gavagham, S., & Egger, M.(2000). Publication and related bias in meta-analysi Power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology. 53. 1119-1129. CONTACT INFORMATION Your comments and questions are d and encouraged. Contact the author at: Jin Liu University of South Carolina Columbia E-mail: jinliu091128@hotmail.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 5