Understanding Statistics for Research Staff!

Similar documents
Psychology Research Process

Psychology Research Process

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

INTERPRETATION OF STUDY FINDINGS: PART I. Julie E. Buring, ScD Harvard School of Public Health Boston, MA

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Business Statistics Probability

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Bias and confounding. Mads Kamper-Jørgensen, associate professor, Section of Social Medicine

Sheila Barron Statistics Outreach Center 2/8/2011

HYPOTHESIS TESTING 1/4/18. Hypothesis. Hypothesis. Potential hypotheses?

Main objective of Epidemiology. Statistical Inference. Statistical Inference: Example. Statistical Inference: Example

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

University of Wollongong. Research Online. Australian Health Services Research Institute

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Confounding and Interaction

Still important ideas

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

PTHP 7101 Research 1 Chapter Assignments

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

PRINCIPLES OF STATISTICS

SEM: the precision of the mean of the sample in predicting the population parameter and is a way of relating the sample mean to population mean

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Confounding Bias: Stratification

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

The degree to which a measure is free from error. (See page 65) Accuracy

17/10/2012. Could a persistent cough be whooping cough? Epidemiology and Statistics Module Lecture 3. Sandra Eldridge

sickness, disease, [toxicity] Hard to quantify

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Reflection Questions for Math 58B

Common Statistical Issues in Biomedical Research

Biostatistics for Med Students. Lecture 1

Basic Biostatistics. Chapter 1. Content

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Inferential Statistics

9 research designs likely for PSYC 2100

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Confounding and Bias

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Two-sample Categorical data: Measuring association

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

To evaluate a single epidemiological article we need to know and discuss the methods used in the underlying study.

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Chapter 11. Experimental Design: One-Way Independent Samples Design

Fundamental Clinical Trial Design

2-Group Multivariate Research & Analyses

investigate. educate. inform.

Statistics Assignment 11 - Solutions

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Research Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Observational Medical Studies. HRP 261 1/13/ am

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

An Introduction to Bayesian Statistics

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Interpretation of Epidemiologic Studies

Formulating Research Questions and Designing Studies. Research Series Session I January 4, 2017

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

FOREVER FREE STOP SMOKING FOR GOOD B O O K L E T. StopSmoking. For Good. Life Without Cigarettes

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Choosing the Correct Statistical Test

Bias. A systematic error (caused by the investigator or the subjects) that causes an incorrect (overor under-) estimate of an association.

BIOSTATISTICS. Dr. Hamza Aduraidi

Research Designs and Potential Interpretation of Data: Introduction to Statistics. Let s Take it Step by Step... Confused by Statistics?

Gathering. Useful Data. Chapter 3. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Assessment Schedule 2015 Mathematics and Statistics (Statistics): Evaluate statistically based reports (91584)

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Brown University Department of Emergency Medicine Providence, RI

Evaluation: Controlled Experiments. Title Text

GLOSSARY OF GENERAL TERMS

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Measuring association in contingency tables

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Economics 345 Applied Econometrics

Hierarchy of Statistical Goals

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov

City, University of London Institutional Repository

AMSc Research Methods Research approach IV: Experimental [2]

Overview of Clinical Study Design Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine US National

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Understandable Statistics

5.3: Associations in Categorical Variables

Transcription:

Statistics for Dummies? Understanding Statistics for Research Staff! Those of us who DO the research, but not the statistics. Rachel Enriquez, RN PhD Epidemiologist

Why do we do Clinical Research?

Epidemiology Epidemiology focuses on why we do clinical research. Official definition: The study of the distribution of determinants of disease and response to treatment in populations. The association of these determinants with outcomes is not random. The application of studies to control the effects of disease in population.

If it s not random, then statistical analysis may be able to find it. Statistics is the science of collecting, organizing, summarizing, analyzing, and making inference from data. Descriptive statistics includes collecting, organizing, summarizing, and presenting data. Inferential statistics includes making inferences, hypothesis testing, and determining relationships.

Understanding Statistics Population Description Inference BIG WORDS Significant Valid No formulas Focus on frequentist

Statisticians need DATA Qualitative Data Categorical Sex Diagnosis Anything that s not a # Rank (1 st, 2 nd, etc) Quantitative Data Something you measure Age Weight Systolic BP Viral load

Data Comes from a Population In clinical research the population of interest is typically human. The population is who you want to infer to We sample the population because we can t measure everybody. Our sample will not be perfect. True random samples are extremely rare Random sampling error

Describing the Population Characteristic N % Grand Region East 495 35.5 Middle 535 38.4 West 360 25.8 Mean SD BMI (median = 40) 41.8 9.5 Age 36.4 13 Table 1 in Papers Frequencies for categorical Central tendency for continuous Mean / median / mode Dispersion SD / range / IQR Distribution Normal (bell shaped) Non-normal (hospital LOS) Small numbers / nonnormal data Non-parametric tests

Descriptive Rates Numerator Denominator Time 2005 N = 4 / 12 Rate = 0.3 3/10 30/100 300/1,000

Significance of Descriptions A description cannot have statistical significance. It s not being compared to anything It is an estimate of reality. The estimate has a measure of dispersion (SD) We can calculate a confidence interval for the estimate that considers sampling error.

Inferential Statistics We are comparing things Point estimate to reference point two groups or two variables. Is bone density associated with HgBA1C? Are children hospitalized with bronchiolitis more likely to have a family history of asthma? Ex- Ex+ Dz- 45 20 Dz+ 60 70 Chi sq = 8.4 / p = 0.004

Need a Hypothesis What are we looking for? A priori How do we phrase the hypothesis? Exposure: independent variable, risk factor, experimental intervention. Outcome: dependent variable, disease, event, survival. Be specific Fishing expedition = bad

Statisticians Require Precise Statement of the Hypothesis H 0 : There is no association between the exposure of interest and the outcome H 1 : There is an association between the exposure and the outcome. This association is not due to chance. The direction of this association is not typically assumed.

Basic Inferences Correlation Pack years of smoking is positively associated with younger age of death. (R square) Association Smokers die, on average, five years earlier than nonsmokers. Smokers are 8 X more likely to get lung cancer than non-smokers.

Measure of Effect Risk Ratio / Odds Ratio / Hazards Ratio Not the same thing, but close enough. Calculate point estimate and confidence interval of the risk associated with an exposure. Smoking Drug X RR = Rate of disease among smokers Rate of disease among non-smokers If Rate ratio = 1 There is no relationship between the exposure and the outcome This is the null value (remember null hypothesis?)

AGAIN point estimate and confidence interval 95% confidence interval normally distributed statistic sample and measurements are valid

Interpreting Measures of Effect RR = 1: No Association RR >1: Risk Factor RR <1: Protective Factor

Crude vs Adjusted Analyses Crude analysis we only look at exposure and outcome. Adjusted analysis we adjust for potential confounding variables The existence of confounding obscures the true relationship between exposure and outcome. We can control for confounding by adjusting for confounding variables using statistical models.

Illustrating Confounding Drinking Coffee Drinking Coffee Pancreatic Cancer Pancreatic Cancer The relationship between coffee drinking and pancreatic cancer is confounded by cigarette smoking. Smoking Cigarettes

Statistical Model Holding the effect of cigarette smoking constant what is the relationship between coffee and pancreatic cancer? Multivariable MODELS can adjust for multiple confounders at the same time (if you have enough data). RR non-smokers = 1.4 RR crude = 4 RR RR 1-10 cigs/day = 1.4 adjusted = 1.4 RR 11-20 cigs/day = 1.4

Estimate the Independent Effect of the Exposure

Models can be very Complicated Complications include Adjusting for multiple factors Categorical / continuous / ordinal predictors and outcomes. Correlated data Time to event data Time effects Interaction terms Typically the goal remains the same: Measure the effect of an exposure on an outcome.

P value? We can make a point estimate and a confidence interval. What s a p value? Significant p value is an arbitrary number. Does NOT measure the strength of association. Measures the likelihood that the observed estimate is due to random sampling error. P < 0.05 is, by convention, an indication of statistical significance.

Statistical Significance Clinical Significance

If you have cancer, which result do you want? Mean = 1.4 SD = 0.1 P <0.0005 Mean = 4 SD = 1.5 P = 0.051

Hypothesis testing Uses the p value Or, does the confidence interval include the null value? Looking at a, b, and c which p value is: p = 0.8 p = 0.047 p = 0.004 CI is better than p value. Figure 1. Risk of adverse pregnancy outcomes among women with asthma. a b c

Testing Hypotheses We can produce a point estimate and a confidence interval. We can calculate a p value. What statistical tests do we use? Ask a statistician / take a class There are many methods T test Chi squared F test Etc.

Hypothesis Tests are not Perfect So, if I test the hypothesis and get an answer, then I know the answer- right? Hah we need jobs! P >0.05 P < 0.05 No association Correct decision Type 1 α error Association Type II β error Correct decision

What Affects Statistical Power? Beta error = low statistical power. The total sample size and the within group sample size Small groups have lower power Unbalanced groups have lower power The effect size It s easier to find a LARGE effect.

Examples of Alpha Error Bad luck Unavoidable (5% of the time)

The Fishing Expedition Type I alpha error is due to random chance associated with sampling. If you perform multiple tests, the chance of random error increases. A priori hypothesis Adjust the level of significance downward.

Sources of Error Measurement error My exposure is ANTI-OXIDANTS. My outcome is DEPRESSION. Misclassification Unmeasured confounding Other dietary factors (recent research shows that selenium is associated with depression). BMI Age Disease sick people may follow a special diet and may also get depressed.

Some More Error Selection Bias Recall Bias Observer Bias Error (bias) can Cause me to find a false association. Prevent me from finding a true association.

Validity Statistics can provide a point estimate a confidence interval. These estimates don t mean a thing if they aren t valid.

Internal Validity Internal validity I have managed measurement and study design error. I have used the right statistical methods Clinical research staff have a big role here. There is a lot we can do to improve data Other problems cannot be eliminated.

Validity External validity Do the results of my study apply to other people who were not in my study. Is the study generalizable?

Randomized Trials These are a special case. Confounding is mostly controlled by randomization. Measurement is standardized. Exposures and outcomes are strictly defined.

Frequentist Folly Statistical significance is based on sampling error. Sampling error is probably my smallest source of error. Unless I ve done a very good experiment. Rev. Thomas Bayes

Bayesian Statistics Call a real statistician Assume you know something about the problem before you start to study it. Be conservative in you prior estimates. Do your study, and stop and look at your data frequently. Now, based on the new data, where should we place our new best estimate of the truth?

Fun for Decades to Come You will have a job Because real research involves error Medical knowledge changes Statisticians need data Custom databases (MS access) Bioinformatics (SAS / SQL) Statistics