Power of a Clinical Study

Similar documents
Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Sheila Barron Statistics Outreach Center 2/8/2011

SAMPLE SIZE IN CLINICAL RESEARCH, THE NUMBER WE NEED

Stat Wk 9: Hypothesis Tests and Analysis

Lesson 11.1: The Alpha Value

Biostatistics 3. Developed by Pfizer. March 2018

SAMPLING AND SAMPLE SIZE

Business Statistics Probability

Methods for Determining Random Sample Size

TEACHING HYPOTHESIS TESTING: WHAT IS DOUBTED, WHAT IS TESTED?

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Confidence Intervals On Subsets May Be Misleading

Chapter 12. The One- Sample

Clinical trial design issues and options for the study of rare diseases

Examining differences between two sets of scores

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Power & Sample Size. Dr. Andrea Benedetti

NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!!

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

BIOSTATISTICAL METHODS

An introduction to power and sample size estimation

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Student Performance Q&A:

Statistical inference provides methods for drawing conclusions about a population from sample data.

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Rationales for an accurate sample size evaluation

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Sleeping Beauty is told the following:

appstats26.notebook April 17, 2015

Applied Statistical Analysis EDUC 6050 Week 4

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Still important ideas

Common Statistical Issues in Biomedical Research

Chi Square Goodness of Fit

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

Study Guide for the Final Exam

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

Objectives. 6.3, 6.4 Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

City, University of London Institutional Repository

Karl Popper is probably the most influential philosopher

Chapter 12: Introduction to Analysis of Variance

STAT 113: PAIRED SAMPLES (MEAN OF DIFFERENCES)

Still important ideas

Political Science 15, Winter 2014 Final Review

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

e.com/watch?v=hz1f yhvojr4 e.com/watch?v=kmy xd6qeass

Basic Statistics and Data Analysis in Work psychology: Statistical Examples

Comparing multiple proportions

Power Analysis: A Crucial Step in any Social Science Study

Never P alone: The value of estimates and confidence intervals

Overview. Survey Methods & Design in Psychology. Readings. Significance Testing. Significance Testing. The Logic of Significance Testing

Unit 1 Exploring and Understanding Data

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

University of Wollongong. Research Online. Australian Health Services Research Institute

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

Viewpoints on Setting Clinical Trial Futility Criteria

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Sample Size Considerations. Todd Alonzo, PhD

t-test for r Copyright 2000 Tom Malloy. All rights reserved

SINGLE-CASE RESEARCH. Relevant History. Relevant History 1/9/2018

Teaching Statistics with Coins and Playing Cards Going Beyond Probabilities

Herman Aguinis Avram Tucker Distinguished Scholar George Washington U. School of Business

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Lessons in biostatistics

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

LAW RESEARCH METHODOLOGY HYPOTHESIS

Understanding Statistics for Research Staff!

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Bayesian and Frequentist Approaches

BIOSTATISTICS. Dr. Hamza Aduraidi

Psychology Research Process

Challenges of Observational and Retrospective Studies

Controlled Trials. Spyros Kitsiou, PhD

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

CHAPTER VI RESEARCH METHODOLOGY

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Checking the counterarguments confirms that publication bias contaminated studies relating social class and unethical behavior

Psychology Research Process

Stat 111 Final. 1. What are the null and alternative hypotheses for Alex and Sarah having the same mean lap time?

PTHP 7101 Research 1 Chapter Assignments

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

Effect of Sample Size on Correlation and Regression Coefficients

MS&E 226: Small Data

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

Demonstrating Client Improvement to Yourself and Others

4 Diagnostic Tests and Measures of Agreement

Evaluating you relationships

Thinking about Inference

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

How to use research to lead your clinical decision

Chapter 20 Confidence Intervals with proportions!

SAMPLE SIZE AND POWER

Transcription:

Power of a Clinical Study M.Yusuf Celik 1, Editor-in-Chief 1 Prof.Dr. Biruni University, Medical Faculty, Dept of Biostatistics, Topkapi, Istanbul. Abstract The probability of not committing a Type II error is called the power of a hypothesis test. Making the wrong decisions can have serious results and risks. Sample size is closely tied to statistical power, which is the ability of a study to enable detection of a statistically significant difference when there truly is one. A trade-off exists between a feasible sample size and adequate statistical power. Sample size is important primarily because of its effect on statistical power. To increase power of the study depends to the main factors as follows; Increase alpha, conduct a one-tailed test, increase the effect size, decrease random error, increase sample size. Underestimation of sample size may result in drug turning out to be statistically non-significant even though clinical significance exists. At the result, underpowered study may not yield useful results and consequently unnecessarily put respondents at risk. Overall, researchers can and should attend to power and calculate it at the beginning of the study. More importantly, supplementary computer programs are developed to calculate it. Key worlds: Power, Sample Size, Type I Error, Type II Error Researchers are often disregard the power of his research. However, the results of researchers with little power is not reliable. Statistical methods and techniques are a tool. Making the wrong decisions can have serious results and risks. It is essential that the results of the analysis are prepared in a consistent and standard way so that the comparison of the information is meaningful and leads to the right decision. Underestimation of sample size may result in drug turning out to be statistically non-significant even though clinical significance exists. There are two main reasons why a study may not show a significant difference between groups being studied (e.g. in a randomized trial of a new drug, or a case-control study testing the effect of an exposure on a disease). 1)There really was no significant difference (hence a true negative result), 1

2)There was a difference but the study failed to detect it (false negative result). This may arise because the study was poorly designed (e.g. used imprecise measurements) or because the study was too small (in statistical jargon, it "lacked power"). Statistical power is affected by 3 factors: 1)The difference in outcome rates between the two groups. A smaller difference requires exponentially more power. 2)The level of significant difference you are hoping to show (e.g. p < 0.05 or <0.001). Chasing after a small p value takes more study power, 3)The frequency of the outcome in the two groups. Imagine an exposure that increases incidence by a third: it is easier to show a difference between 30 and 45 percent than between 10 and 15 per cent. Maximum power is reached when roughly half of the people studied have the outcome of interest (1). Sample size is closely tied to statistical power, which is the ability of a study to enable detection of a statistically significant difference when there truly is one. A trade-off exists between a feasible sample size and adequate statistical power. Sample size is important primarily because of its effect on statistical power. Statistical power is the probability that a statistical test will indicate a significant difference when there truly is one. Statistical power is analogous to the sensitivity of a diagnostic test and one could mentally substitute the word sensitivity for the word power during statistical discussions (2). Power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. It measures the ability of a test to reject the null hypothesis when it should be rejected. At a given significance level, the power of the test is increased by having a larger sample size. The minimum accepted level is considered to be 80%, which means there is an eight in ten chance of detecting a difference of the specified effect size (3). The power could be defined as (4): Power 1) Probability of finding true significance 2) True positive 3) 1 beta, where beta is : Probability of not finding significance when it is there False negative Probability of a Type II error 4) Usually set to.80 Power should not be less that 80%. Type II error is directly proportional to sample size. The power could be define as follows (5); 2

The probability f not committing a Type II error is called he power of a hypothesis test. Increasing beta, the probability of a Type II error will reduce the power of a hypothesis test (Figure 1). Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Increasing the significance level reduces the region of acceptance, which makes the hypothesis test more likely to reject the null hypothesis, thus increasing the power of the test. Since, by definition, power is equal to one minus beta, the power of a test will get smaller as beta gets bigger. Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of the test. The effect size is not affected by sample size. And the probability of making a Type II error gets smaller, not bigger, as sample size increases. Figure 1. Type 1 and Type 2 errors β depends on additional factors from your sample, like the standard error and the effect size. Effect size is the raw distance from the null hypothesis you re considering when you set the type II error rate. In the coin flipping scenarios, the farther we got from 5 heads out of 10, the larger the effect size. β decreases when effect size increases. If a new medicine causes someone s heart to explode (large effect size), that s easier to notice than a small increase in blood pressure (small effect size). 1 β is called the power of the test. If 3

we re considering a large effect, our test is going to have a lot of power to detect the effect and reject the null based on that. β increases when standard error increases. We can t usually get a small type 1 error rate and a small type 2 error rate at the same time. When we make α small, we re less likely to reject the null. That means we re more likely to fail to reject the null. Failing to reject the null when we should is a type 2 error. α is the type 1 error rate. αis the rate that type 1 errors will occur. That means if we re testing many hypotheses at once, then α of rejected nulls will be falsely rejected. In courts, some people are wrongly convinced. Type 2 error: The act of failing to reject the null when it is false. β is the type 2 error rate. β is the chance of failing to reject the null hypothesis when we should reject it (6,7). To increase power of the study depends to the main factors as follows (8): 1. Increase alpha 2. Conduct a one-tailed test 3. Increase the effect size 4. Decrease random error 5. Increase sample size It should also be remembered that using nonparametric tests have less power and categorizing continuous variables should be avoided as it reduces statistical power. Parametric tests will have more power than a nonparametric counterpart if the assumptions are met. However, the distributional assumptions are often strict or undesirable for the parametric tests and deviations can lead to misleading results. The ability of statistical tests to demonstrate that a difference or association of a particular magnitude would be unlikely if the null hypothesis is true, under circumstances in which the null hypothesis is actually false, is referred to as statistical power. The more statistical power that you have in your research study; the more likely you are to get the evidence you are looking for if your research hypothesis is actually true (9). It has to remembered that multivariate techniques have emerged as a powerful tool to analyse data represented in terms of many variables. The main reason being that a series of univariate analysis carried out separately for each variable may, at times, lead to incorrect interpretation of the result. This is so because univariate analysis does not consider the correlation or inter-dependence among the variables. As a result, during the last fifty years, a number of statisticians have contributed to the development of several multivariate techniques. Today, these techniques are being applied in many fields such as economics, sociology, psychology, agriculture, 4

anthropology, biology and medicine. These techniques are used in analyzing social, psychological, medical and economic data, specially when the variables concerning research studies of these fields are supposed to be correlated with each other and when rigorous probabilistic models cannot be appropriately used. Applications of multivariate techniques in practice have been accelerated in modern times because of the advent of high speed electronic computers (10). At the result, underpowered study may not yield useful results and consequently unnecessarily put respondents at risk. Overall, researchers can and should attend to power and calculate it at the beginning of the study. More importantly, supplementary computer programs are developed to calculate it. References 1) Statistical power of a study, http://www.med.uottawa.ca/sim/dat a /Study_Design_ Power_e.htm, June 2015 2) Eng J. Sample Size Estimation: How Many Individuals Should Be Studied? Radiology 2003; 227:309 313 3) McCrum-Gardner E. Sample size and power calculations made simple. International Journal of Therapy and Rehabilitation, January 2010, Vol 17, No 1 4) A Researcher s Guide to Power Analysis, Anne Hunt, OMDS, Utah State University. https://rgs.usu.edu/irb/files/uploads/ A_Researchers_Guide_to_Power_ Analysis_USU.pdf 5) Power of a Hypothesis Test, Ipsa Software, Power System Analysis Software. Fast, Flexible and Powerful. http://stattrek.com/hypothesistest/power-of-test.aspx?tutorial=ap 6)Taylor C. What Is the Difference Between Type I and Type II Errors? http://statistics.about.com/od/infere ntial-statistics/a/type-i-and-type- II-Errors.htm 7) Review P-values Type I and Type II Errors, http://www.sfu.ca /~jackd/ Stat203/ Wk07_3_Full.pdf 8) Grace-Martin K. 5 Ways to Increase Power in a Study. The Analysis Factor, http://www.theanalysisfactor.com/5 -ways-to-increase-power-in-a-study 9) Dewberry C. Statistical Methods for Organizational Research. Taylor & Francis e-library, 2005. 10) Kothari CR. Research Methodology Copyright 2004, New Age International (P) L 5