BROUGHT TO YOU BY Biostatistics 3 Developed by Pfizer March 2018 This learning module is intended for UK healthcare professionals only. Job bag: PP-GEP-GBR-0986 Date of preparation March 2018.
Agenda I. Introduction II. Hypothesis testing III. Statistical significance IV. How to define Power V. Statistical tests
Introduction
Learning Objectives To understand hypothesis testing and how you decide which hypothesis type to use. To understand statistical significance. To learn the difference between alpha and beta when understanding how to define Power. To understand the different statistical tests and trials to decide which is best to use. 1. t-tests 2. Analysis of Variance (ANOVA) 3. Chi-Square 4. ANOVA 5. Meta- analysis and real world data
1 Hypothesis testing This learning module is intended for UK healthcare professionals only. Medical knowledge is constantly changing. As new information becomes available, changes in treatment, procedures, equipment and the use of drugs become necessary. The authors and editors have, as far as it is possible, taken care to ensure that the information given in this module is accurate and up to date at the time it was created. However, users are strongly advised to confirm that the information, especially with regard to drug usage, complies with current legislation and standards of practice.this learning module is intended for UK healthcare professionals only.
Hypothesis testing Hypothesis testing is the method used to evaluate the likelihood that differences between variables could be due to chance alone. There are two types of hypothesis: 1. Null Hypothesis (H 0 ) 2. Alternate or Research Hypothesis (H A or H 1 ) Assumes no effect of the intervention e.g. diuretic to reduce blood pressure. Assumes any difference is due to chance. A definite statement of the relationship and inequality between two variables.
Hypothesis testing Two Types of Research Hypotheses: 1. Non-directional 2. Directional Non-Directional H A There IS a difference in the incidence of side effects with Drug A and Drug B Directional H A H A There IS a lower/ higher incidence of side effects with Drug A than with Drug B Non-directional or two-tailed research hypothesis Directional or one-tailed research hypothesis
Hypothesis testing Confidence Level (α): The probability of rejecting the null hypothesis when there is NO difference. Determined by directionality. In a one-tailed test, the entire area of rejection of the null hypothesis is contained in one tail of the distribution curve Directionality and Probability One-tailed (directional) study is more stringent than two-tailed (non-directional) In a two-tailed test, the area of rejection of the null hypothesis is divided between the two tails of the distribution curve
Hypothesis testing Legal Analogy Null Hypothesis Innocent Until Proven Guilty Research Hypothesis Prosecuting Attorney s Case Confidence Level Proof Beyond a Reasonable Doubt Hypothesis Testing Justice and Protection of the Innocent
Statistical significance 2
Statistical significance The probability that an observed outcome of an experiment or trial is due to chance alone Hypothesis The addition of Drug A to a regimen of beta blockers and ACE inhibitors for the treatment of congestive heart failure will improve survival and decrease hospitalizations Influence on outcomes improved survival decreased hospitalizations The hypothesis assumes that we have controlled for any and all factors other than the administration of Drug A that might account for any observed differences between the two study groups. Therefore, the influence on outcomes is whether or not patients receive Drug A as part of their daily drug regimen
Statistical Level p (significance level) = chance that study conclusions are wrong. 1-p (level of confidence) = chance that the study conclusions are correct. The problem here is that as much as we try to control as many outside influences as possible, we can never be 100% sure. There is always a chance, of a false positive. The level of chance, or risk, that we are willing to take is known as the significance level, and is expressed mathematically as p. So if we had calculated a p value of 0.01, the hypothesis stating that there is a 1% chance that our study conclusions are wrong that we have incorrectly found a difference in our study that is due to some influence other than the study drug etc. Another interpretation would be to use the flip side of the coin our level of confidence, or 1-p: is 99% or 99% confident that our study conclusion is correct that a reduction in hospitalization rates was due to the effects of Drug A and not something else
Statistical Level The significance level, expressed as α, is defined before study results are analysed. α = 0.05 or 0.01 The level of significance for most research studies (which are non-directional, meaning that the investigators are simply looking for a difference between the treatment group and the control group) is set at either 0.05 or 0.01, with 0.05 being the most common. After analysis of the study results, a p value is reported and compared to alpha. p α = statistically significant
Statistical significance An appropriate α (e.g., 0.05) reduces the risk of not finding a difference (accepting H o ) when there really is a difference 0.05 Looking at the diagram, 0.01 has less of a possible rejection zone than 0.05. So why don t all researchers stick to 0.01? The answer is that being too stringent can protect from risk but can also lead to missing some of the effects Rejection Zones 0.01 Test Value (Accept Ho) Test Value (Reject Ho)
Confidence intervals p values report the significance of study results. p value <0.05 means there is less than 5% chance that study drug decreased BP more than placebo by random chance alone. α =.01 α =.05 p value tells nothing about magnitude of difference between study drug and placebo. Confidence intervals (CI) are defined by upper and lower confidence limits. Here, the lower limit is 21 mmhg and the upper limit is 7 mmhg. 95% CI (-21 mmhg, -7 mmhg) Confidence interval tells us how confident we can be that the treatment effect observed in the study is representative of the true treatment effect in the population as a whole -21 mmhg -7 mmhg 95% CI 95% CI (-21 mmhg, (-21 mmhg, -7 mmhg) -7 mmhg)
How to define Power 3
Power calculations When planning a study you will often hear people discuss how powered a study needs to be. Power, in statistics is the ability of a statistical test to show whether a significant difference α =.01 really exists or in α =.05 other words the probability that a statistical test will show a true difference when one exists The power is dependent on 2 factors: 1. The preset confidence level, or alpha (α) 2. The amount of difference between groups - sample size Alpha (α) Power Size of Difference Sample Size
Power alpha and beta There are two important concepts to understand: first, alpha and beta which are interrelated. All things being equal, as alpha is increased, beta is decreased. As alpha is decreased, beta is increased. In α other =.01 words, alpha and beta always α =.05 move in opposite directions. Power = 1 beta Statistical power is defined as 1- beta. Thus, mathematically, power indicates the probability of NOT making the mistake of not finding a difference when one actually exists (remember the definition of beta: the probability of not finding a difference when there is one). Alpha Beta
Power alpha and beta Remember, if alpha goes up, beta goes down and since power is 1 beta, as beta gets smaller, power increases. The goal of hypothesis testing is to always minimize alpha and maximize power (or 1 beta). Alpha.01.05 Beta.20.10 Power (1-beta).80.90 Example: Suppose we set our alpha, or confidence level, at 0.01, meaning we have accepted a 1% probability of making the mistake of finding a difference when there actually is no difference. We also decide to use a standard beta, or probability of NOT finding a difference when there IS a difference, of 0.20, or 20%. We then calculate a power of 0.80 or 80% (1 minus 0.20) for the study. Thus, the study is said to have an 80% power.
Power alpha and beta Power (Statistical) = Sensitivity (Diagnostic) Power: The probability of a statistical test to find differences when a difference is present Sensitivity: The probability of a diagnostic test to find disease when disease is present
Statistical tests 4
Statistical tests The three most basic types of statistical tests are 1. t-tests 2. Analysis of Variance (ANOVA) 3. Chi-Square
t-tests To determine the significance between the means of two groups t-tests can be either paired or unpaired Paired t-test examines the difference within the means of an individual within a study group Unpaired t-test looks at the difference in means of two separate groups
t-tests One paired or two One-tailed t-test Used to test directional hypotheses Two-tailed t-test Used to test non-directional hypotheses In a one-tailed t-test, the entire area of rejection of the null hypothesis is contained in one tail of the distribution curve In a two-tailed t-test, the area of rejection of the null hypothesis is divided between the two tails of the distribution curve
Analysis of variance (ANOVA) Used to compare results in three or more groups, comparison of three or more means. Example Suppose we wanted to determine if there were statistical differences between Antibiotic A and Antibiotic B on; (1) hospital length of stay, (2) Bacteriological cure, (3) days of therapy until infection resolution, and (4) adverse reactions. Unfortunately, we can t legitimately do individual t-tests on each variable because in doing so, we mathematically increase our chances of finding a false positive significance. By doing multiple t-tests on the same data, an acceptable p value of 0.05 would actually balloon up to unacceptable levels. The more t-tests performed (in our case four), the p value goes from 0.05 to 0.19, which would be too large of a margin of error. In other words, there would be a 19% chance that any differences found between the treatment groups would be due to chance alone. Antibiotic A vs. Antibiotic B Hospital Length of Stay Bacteriological Cure Days to Infection Resolution Adverse Reactions t-test t-test t-test t-test
ANOVA Whilst the significance (F) gleaned from an ANOVA describes whether there is an overall difference between the groups it does not tell you where the difference is. Example If you were to anaylse blood pressure in males and females which were grouped by sex and age. The ANOVA would tell you if there was a difference between groups but you would need a further test to isolate whether age and sex alone or together caused this difference. This form of testing is called a post hoc analysis Post hoc analysis test - Bonferroni test, Tukey test and Scheffe test
Chi- Square (χ 2 ) Used for nonnumeric variables e.g., frequencies, proportions, categories Example: Does the efficacy of a new antidepressant drug differ by disease severity? Efficacy in mild depression Efficacy in moderate depression Efficacy in severe depression
Meta-analysis A meta-analysis is a quantitative statistical analysis that is applied to separate but similar experiments of different and usually independent researchers and that involves pooling the data and using the pooled data to test the effectiveness of the results This is especially useful when there have been several small studies looking at one thing and you want to see what this effect is like in a larger population. For example a report on cholesterol levels presented a comprehensive meta-analysis of 32 randomized studies involving 42,000 individuals. Now there are some caveats to a meta-analysis being that all of these studies where most likely carried out at different times, differing protocols etc. Using statistical methods one can identify feature which may lead to false positives or negatives and control for them. These features are known as variables. Once the variables are controlled for the researcher can pool the data and then run the statistical analysis by a type of ANOVA.
Randomised control trials Before a medicine is granted a license it undergoes rigorous testing to established various features, like efficacy, safety etc. The trials which test this are known as randomised controlled trials. These are very stringent, testing a specific population for specific outcomes. Inclusion and exclusion criteria is established so that a specific population is recruited to be tested. More often than not ANOVAs will be the biases of the statistical analyses. From the trial findings the medicine manufacturer can apply for a license to the indications examined. A summary of these results with the license indication is then published within journals but succinctly on the summary of product characteristics (SmPC) found on https://www.medicines.org.uk/emc. In addition to randomised control trials other forms of analysis may be carried out to established different features about the medicine. One might want to group all the randomised trials carried out in each country and to do so they could carry out a meta-analysis One may want to test this new medicine in comparison to the gold standard medicine. To do so they could carry out a non-inferiority or superiority study. As the name suggests a non-inferiority study is used to show that in say efficacy the two drugs are similar and a superiority study can allow the researcher to establish if one is more efficacious than the other
Real life data The buzz word in terms of data at the moment appears to be real life or real world data But what does this mean and how clinically useful is it? Real life data is a form of non-interventional data found and tested after the medicine is license. It looks a patients prescribed the medicine in real life without the constrains of a randomised control trial. Broad Narrow Registration RCTs Long-term Phase III Pragmatic RCTs Real life studies Constrained Study Design Ecology of care Free This allows to see the effectiveness of a product when challenged with everyday problems like patient adherence, concomitant medication, patient life style etc.,
Registration Sign up Thank you for completing this module. Please click here to sign up for updates and new modules. 3 1
Contact Us For general inquiries or information about Pfizer medicines, you can contact Pfizer on 01304 616161. This learning module is intended for UK healthcare professionals only. Job bag: PP-GEP-GBR-0986 Date of preparation March 2018.
5 Questions This learning module is intended for UK healthcare professionals only. Medical knowledge is constantly changing. As new information becomes available, changes in treatment, procedures, equipment and the use of drugs become necessary. The authors and editors have, as far as it is possible, taken care to ensure that the information given in this module is accurate and up to date at the time it was created. However, users are strongly advised to confirm that the information, especially with regard to drug usage, complies with current legislation and standards of practice.this learning module is intended for UK healthcare professionals only.
Question 1 When testing the hypothesis, an alternate hypothesis (H A or H 1 ) is Assumes any difference is due to chance A definite statement of the relationship and inequality between two variables Assumes no effect of the intervention e.g., diuretic to reduce blood pressure
Question 2 In terms of p value, what level of significance is more stringent 0.05 0.01
Question 3 Factors power is dependent on are, select all that apply: The preset confidence level, or alpha (α) The amount of difference between groups - sample size
Question 4 After analysing the data by ANOVA, you can carry out a post hoc test but what are those called (select all that applies) Bonferroni test Tukey test Chicken test Scheffe test
Question 5 Meta-analysis uses data pooled from several previous studies a) True b) False
Certificate of completion Name: Completed Biostatistics Module 3 Job bag: PP-GEP-GBR-0986 Date of preparation March 2018.