Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Similar documents
Biost 590: Statistical Consulting

Lecture Outline Biost 517 Applied Biostatistics I

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Biost 524: Design of Medical Studies

Fundamental Clinical Trial Design

Hierarchy of Statistical Goals

Biostatistics for Med Students. Lecture 1

Basic Biostatistics. Chapter 1. Content

investigate. educate. inform.

BIOSTATISTICAL METHODS

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

STATISTICS AND RESEARCH DESIGN

Understanding Statistics for Research Staff!

PTHP 7101 Research 1 Chapter Assignments

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Ecological Statistics

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

A Case Study: Two-sample categorical data

Common Statistical Issues in Biomedical Research

Analysis and Interpretation of Data Part 1

Biost 517: Applied Biostatistics I Emerson, Fall Homework #2 Key October 10, 2012

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

Bayesian and Frequentist Approaches

From single studies to an EBM based assessment some central issues

Unit 1 Exploring and Understanding Data

The Statistical Analysis of Failure Time Data

MS&E 226: Small Data

Biostatistics II

Slide 1 - Introduction to Statistics Tutorial: An Overview Slide notes

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Template 1 for summarising studies addressing prognostic questions

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

Study Design STUDY DESIGN CASE SERIES AND CROSS-SECTIONAL STUDY DESIGN

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

A Brief Introduction to Bayesian Statistics

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Evidence Based Medicine

Biostats Final Project Fall 2002 Dr. Chang Claire Pothier, Michael O'Connor, Carrie Longano, Jodi Zimmerman - CSU

General Biostatistics Concepts

Table of Contents. EHS EXERCISE 1: Risk Assessment: A Case Study of an Investigation of a Tuberculosis (TB) Outbreak in a Health Care Setting

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Incorporating Clinical Information into the Label

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Choosing the Correct Statistical Test

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Research Designs and Potential Interpretation of Data: Introduction to Statistics. Let s Take it Step by Step... Confused by Statistics?

BEST PRACTICES FOR IMPLEMENTATION AND ANALYSIS OF PAIN SCALE PATIENT REPORTED OUTCOMES IN CLINICAL TRIALS

Chapter 2. The Data Analysis Process and Collecting Data Sensibly. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Inferential Statistics

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Statistics and Epidemiology Practice Questions

BIOSTATISTICS. Dr. Hamza Aduraidi

Understandable Statistics

Chapter 1: Explaining Behavior

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Diagnostic methods 2: receiver operating characteristic (ROC) curves

Module Overview. What is a Marker? Part 1 Overview

Understanding the Hypothesis

Chapter 14: More Powerful Statistical Methods

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Advanced IPD meta-analysis methods for observational studies

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Types of data and how they can be analysed

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Chapter 1. Introduction

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Protocol Development: The Guiding Light of Any Clinical Study

CAN WE PREDICT SURGERY FOR SCIATICA?

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

PRINCIPLES OF STATISTICS

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Online Supplementary Material

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Ordinal Data Modeling

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Introduction to Meta-analysis of Accuracy Data

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Psychology Research Process

Biostatistics and Epidemiology Step 1 Sample Questions Set 1

Psychology Research Process

Transcription:

Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to a Consulting Problem April 8 and 15, 2005 Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington Biost 590, Spr 2005 1 2000, Scott S. Emerson, M.D., Ph.D. Biost 590, Spr 2005 2 Scientific Method Stages of Scientific Studies Observation Hypothesis generation Confirmatory studies Disadvantages: Confounding Limited ability to establish cause and effect Biost 590, Spr 2005 3 Biost 590, Spr 2005 4 Lecture 1: 1

Stages of Scientific Studies Experiment Intervention Elements of experiment Overall goal Specific aims (hypotheses) Materials and methods Collection of data Analysis Interpretation; Refinement of hypotheses Biost 590, Spr 2005 5 Scientific Method A well designed study discriminates between hypotheses The hypotheses should be the most important, viable hypotheses All other things being equal, it should be equally informative for all possible outcomes But may need to consider simplicity of experiments, time, cost Biost 590, Spr 2005 6 Addressing Variability Outcome measures rarely constant Inherent randomness Hidden (unmeasured) variables Probability Models Probability models are used as a basis for variability Distribution of measurements Summary measure for scientific tendency (Signal and noise) Biost 590, Spr 2005 7 Biost 590, Spr 2005 8 Lecture 1: 2

Role of Statistics Statistics is used to Describe tendencies of response Quantify uncertainty in conclusions Statistics means never having to say you are certain (ASA sweatshirt) Statistical Classification of Scientific Questions Biost 590, Spr 2005 9 Biost 590, Spr 2005 10 General Classification Prediction of individual observations Clustering of observations Clustering of variables Quantification of distributions Comparing distributions 1. Prediction Focus is on individual measurements Point prediction: Best single estimate for the measurement that would be obtained on a future individual Continuous measurements Binary measurements (discrimination) Interval prediction: Range of measurements that might reasonably be observed for a future individual Biost 590, Spr 2005 11 Biost 590, Spr 2005 12 Lecture 1: 3

Example: Continuous Prediction Creatinine clearance Creatinine Breakdown product of creatine Removed by the kidneys by filtration Little secretion, reabsorption Measure of renal function Amount of creatinine cleared by the kidneys in 24 hours Example: Continuous Prediction Problem: Need to collect urine output (and blood creatinine) for 24 hours Goal: Find blood, urine measures that can be obtained instantly, yet still provide an accurate estimate of a patient s creatinine clearance Biost 590, Spr 2005 13 Biost 590, Spr 2005 14 Example: Continuous Prediction Statistical Tasks: Training sample Measure true creatinine clearance Measure sex, age, weight, height, creatinine Statistical analysis Regression model that uses other variables to predict creatinine clearance Quantify accuracy of predictive model (Mean squared error?) Example: Discrimination Diagnosis of prostate cancer Use other measurements to predict whether a particular patient might have prostate cancer Demographic: Age, race, (sex) Clinical: Symptoms Biological: Prostate specific antigen (PSA) Goal is a diagnosis for each patient Biost 590, Spr 2005 15 Biost 590, Spr 2005 16 Lecture 1: 4

Example: Discrimination Statistical Tasks: Training sample Gold standard diagnosis Measure age, race, PSA Statistical analysis Regression model that uses other variables to predict prostate cancer diagnosis Quantify accuracy of predictive model (ROC curve analysis?) Example: Interval Prediction Determining normal range for PSA Identify the range of PSA values that would be expected in the 95% most typical healthy males Age, race specific values Biost 590, Spr 2005 17 Biost 590, Spr 2005 18 Example: Interval Prediction Comment About Prediction Statistical Tasks: Training sample Measure age, race, PSA Statistical analysis Regression model that uses other variables to define prediction interval (Mean plus/minus 2 SD?) (Confidence interval for quantiles?) Quantify accuracy of predictive model (Coverage probabilities?) Biost 590, Spr 2005 19 For me to consider a problem to be purely a prediction problem, interest must lie solely in the predicted value, and not in the way that value was obtained E.g., in weather prediction, we might just want to know the weather tomorrow We won t be trying to impress upon our audience the way it should be predicted I do not think this is very often the case Biost 590, Spr 2005 20 Lecture 1: 5

2. Cluster Analysis Focus is on identifying similar groups of observations Divide a population into subgroups based on patterns of similar measurements Univariate, multivariate Known or unknown number of clusters (All variables treated symmetrically: No delineation between outcomes and groups) Biost 590, Spr 2005 21 Example: Cluster Analysis Potential for different causes for the same clinical syndrome: Glucose in urine Identify patterns of measurements that separate subpopulations of patients with diabetes Age of onset Symptoms at onset (e.g., weight) Auto-antibodies Characteristics of epidemics Biost 590, Spr 2005 22 Example: Cluster Analysis Statistical Tasks: Training sample Measure age, change in weight, auto-antibodies, etc. Statistical analysis Cluster analysis Summarize variable distributions within identified clusters (Attach labels?) 3. Factor Analysis Identifying hidden variables indicating groups that tend to have similar measurements of some outcome Interest in some particular outcome measurement Predictors that imprecisely measure some abstract quality Desire to find patterns in predictors that more precisely reflect the abstract quality Biost 590, Spr 2005 23 Biost 590, Spr 2005 24 Lecture 1: 6

Example: Factor Analysis Identifying barriers to patient compliance in clinical trials In the Health Behavior Questionaire, multiple variables might be used to measure Self-perceived health; social support; depression Desire is to Find subset of questions that would suffice Identify hidden variables that affect compliance Example: Factor Analysis Statistical Tasks: Training sample Measure response to questionnaire Statistical analysis Factor analysis (principal components) Report contribution to factors, factor loadings (Attach labels?) (Draw conclusions about importance of latent variables?) Biost 590, Spr 2005 25 Biost 590, Spr 2005 26 4. Quantifying Distributions Focus is on distributions of measurements within a population Scientific questions about tendencies for specific measurements within a population Point estimates of summary measures Interval estimates of summary measures Quantifying uncertainty Decisions about hypothesized values Example: Estimate Proportions Proportion of women among patients with primary biliary cirrhosis Serious liver disease often leading to liver failure Unknown etiology Characterizing types of people who suffer from disease may provide clues about causes (About 90% of patients with PBC are women) Biost 590, Spr 2005 27 Biost 590, Spr 2005 28 Lecture 1: 7

Example: Estimate Proportions Statistical Tasks Sample of patients (from registry?) Measure demographics, etc. Statistical analysis Best estimate of the proportion Quantify uncertainty in that estimate Compare to the known proportion of women in the general population (approximately 50%)? Example: Estimation of Median Median life expectancy of patients newly diagnosed with stage II breast cancer Want to know prognosis Patients planning Judging public health risks Biost 590, Spr 2005 29 Biost 590, Spr 2005 30 Example: Estimation of Median Statistical Tasks Sample of patients with newly diagnosed with stage II breast cancer Follow for survival time (may be censored) Statistical analysis Best estimate of the median survival (K-M?) Quantify uncertainty in that estimate Compare to some clinically important time range (e.g., 10 years) Biost 590, Spr 2005 31 5. Comparing Distributions Comparing distributions of measurements across populations 5a. Identifying groups that have different distributions of some measurement 5b. Quantifying differences in the distribution of some measurement across predefined groups (effects or associations) 5c. Quantifying differences in effects across subgroups (interactions or effect modification) Biost 590, Spr 2005 32 Lecture 1: 8

5a. Identifying Groups Example: Identifying Groups Identifying groups that have different distributions of some measurement Focus is on some particular outcome measurement Identify groups based on other measurements E.g., quantifying distributions within subgroups E.g, stepwise regression models (cf: Cluster analysis where all measurements are treated symmetrically) Biost 590, Spr 2005 33 Chromosomal abnormalities associated with ovarian cancer Cytogenetic analysis of dividing cells identifies regions of the chromosomes with defects Cancer is caused by some defects, and cancer causes other defects Approximately 370 identifiable regions Which of the regions are the most promising to explore in more focused studies? Biost 590, Spr 2005 34 Example: Identifying Groups Statistical Tasks: Sample of cancer tissues Measure type of cancer (ovarian, melanoma, etc.) Measure chromosomal defects Statistical analysis Stepwise regression models of chromosomal abnormalities predicting cancer type (Use p values to rank interest in particular regions?) Example: Identifying Groups Risk factors for diabetes Variables most associated with diabetes risk may give clues about etiology and eventual prevention Biost 590, Spr 2005 35 Biost 590, Spr 2005 36 Lecture 1: 9

Example: Identifying Groups Statistical Tasks Sample subjects to measure risk factors and disease prevalence Cohort study Case-control study Statistical analysis Stepwise model building (Rank most interesting variables by p value?) Biost 590, Spr 2005 37 5b. Detecting Associations Associations between variables distributions of one variable differ across groups defined by another Existence of differences Direction of tendency of effect First, second order relationships in a summary measure Characterization of dose-response in a summary measure Biost 590, Spr 2005 38 Example: Detecting Association Effect of blood cholesterol levels on risk of heart attacks Understanding etiology of heart attacks may lead to prevention and/or treatment strategies Biost 590, Spr 2005 39 Example: Detecting Association Statistical tasks Measure risk factors, MIs on sample Cohort or case-control sample Statistical analyiss Regression model (possibly adjusted) Cohort: Incidence of MIs across cholesterol levels Case-control: Cholesterol levels across MI status (Comparison can be at many levels of detail) Quantify estimates, precision, confidence in decisions Biost 590, Spr 2005 40 Lecture 1: 10

Detecting Effect Modification Quantifying differences in effects across subgroups (interactions or effect modification) Existence of interaction Direction of interaction (synergy, antagonism) Quantification of exact relationship of interaction Example: Effect Modification Identifying whether effect of cholesterol on heart attacks differs by sex Comparing association between blood cholesterol level and incidence of heart attacks between sexes Quantify association in men Quantify association in women Compare measures of association Biost 590, Spr 2005 41 Biost 590, Spr 2005 42 Approach Common to #4 & #5 In answering each scientific question, statistics typically provides four numbers Best estimate Best can be defined by frequentist or Bayesian criteria Interval describing precision Confidence interval or Bayesian credible interval Quantification of belief in some hypothesis P value or Bayesian posterior probability Biost 590, Spr 2005 43 Statistical Tasks Biost 590, Spr 2005 44 Lecture 1: 11

Statistical Tasks Statistical considerations come into play in all stages of scientific studies Study Design Data analysis Descriptive statistics Inferential statistics Interpretation and reporting of results Statistical Tasks Study Design Biost 590, Spr 2005 45 Biost 590, Spr 2005 46 Scientific Hypotheses Usual statement: The intervention when given to the target population will tend to result in outcome measurements that are higher than, lower than, or about the same as an absolute standard, or measurements in a comparison group Refining Scientific Hypotheses Statistical hypotheses precisely define the intervention the outcome advise on precision of measurement the target population(s) covariates tend to (the standards for comparison) summary measures relevance of absolute or relative standards Biost 590, Spr 2005 47 Biost 590, Spr 2005 48 Lecture 1: 12

General Methods Choosing a general method for collecting data Observational Intervention Sampling Time Frame Time frame Cross-sectional sampling Real time Event time (e.g. at diagnosis or birth) Longitudinal sampling Prospective Retrospective Biost 590, Spr 2005 49 Biost 590, Spr 2005 50 Common Study Designs Cross-sectional studies (surveys) Cohort studies Case-control studies Interventional studies Cross-sectional Studies Surveys of subjects sampled from a population Real or event time Efficient for examining Common outcomes and risk factors Associations (not cause and effect) Biost 590, Spr 2005 51 Biost 590, Spr 2005 52 Lecture 1: 13

Cohort Studies Groups defined by risk factor Identified prospectively or retrospectively Followed longitudinally for outcome event(s) Efficient for examining Common outcomes Many different outcomes for same exposure Associations (not cause and effect) Biost 590, Spr 2005 53 Case-Control Studies Groups defined by some outcome event Characterize prior exposures Longitudinal study into the past Efficient for examining Rare outcomes Many different risk factors for same outcome Associations (not cause and effect) Biost 590, Spr 2005 54 Interventional Studies Subjects assigned to some intervention Ideally controlled, randomized Followed longitudinally for some outcome So a special case of a cohort study Efficient for examining Common outcomes Cause and effect An Aside: Ability to Detect Associations Biost 590, Spr 2005 55 Biost 590, Spr 2005 56 Lecture 1: 14

Definition of an Association Mathematical Definitions The distributions of two variables are not independent Independence: Equivalent definitions Probability of outcome and exposure is product of Overall probability of outcome, and Overall probability of exposure Distribution of exposure is the same across all outcome categories Distribution of outcome is the same across all exposure categories Biost 590, Spr 2005 57 Independence: Equivalent definitions Joint probability of outcome D and exposure E Pr (D = d 1, E = e 1 ) = Pr (D = d 1 )? Pr (E = e 1 ) Conditional probability of outcome given exposure Pr (D = d 1 E = e 1 ) = Pr (D = d 1 E = e 2 ) Conditional probability of exposure given outcome Pr (E = e 1 D = d 1 ) = Pr (E = e 1 D = d 2 ) Biost 590, Spr 2005 58 Establishing Independence Consider all events defined by the two variables For each choice of d 1, d 2, e 1, e 2 show either Pr (D = d 1, E = e 1 ) = Pr (D = d 1 )? Pr (E = e 1 ), Pr (D = d 1 E = e 1 ) = Pr (D = d 1 E = e 2 ), or Pr (E = e 1 D = d 1 ) = Pr (E = e 1 D = d 2 ) It takes an infinite sample size to prove equality Detecting Associations Instead, we detect associations by showing that two variables are not independent Thus, we show that two distributions are different Biost 590, Spr 2005 59 Biost 590, Spr 2005 60 Lecture 1: 15

Summary Measures Generally we consider some summary measure of the distribution E.g., when we use the mean, we show an association by showing either E (D? E)? E (D)? E (E), E (D E = e 1 )? E (D E = e 2 ), or E (E D = d 1 )? E (E D = d 2 ) Justification This works, because if two distributions are the same, ALL summary measures should be the same If some summary measure is different, then we know the distributions are different Biost 590, Spr 2005 61 Biost 590, Spr 2005 62 Impact of Study Design To establish an association Cohort studies must examine whether Pr (D E = e 1 )? Pr (D E = e 2 ) Case-control studies must examine whether Pr (E D = d 1 )? Pr (E D = d 2 ) Cross sectional studies can examine either of the above, as well as whether Pr (D, E)? Pr (D)? Pr(E) Statistical Tasks Study Design (cont.) Biost 590, Spr 2005 63 Biost 590, Spr 2005 64 Lecture 1: 16

Treatment of Variables Materials Measure and compare distribution across groups (response variable in regression) Vary systematically (intervention) Control at a single level (fixed effects) Control at multiple levels (fixed or random effects) Stratified (blocked) randomization Measure and adjust (fixed or random effects) Treat as error Biost 590, Spr 2005 65 Subjects used for measurement Inclusion criteria Exclusion criteria Independent subjects versus matched groups Repeated measures designs Cross-over designs Biost 590, Spr 2005 66 Statistical Hypotheses Selection of summary measure from distribution Characterize measurements within a population Characterize differences between groups Selection of groups Covariates in a regression model Hypotheses in terms of summary measure Biost 590, Spr 2005 67 Univariate Summary Measures Many times, statistical hypotheses are stated in terms of summary measures for the distribution within groups Means (arithmetic, geometric, harmonic, ) Medians (or other quantiles) Proportion exceeding some threshold Odds of exceeding some threshold Time averaged hazard function (instantaneous risk) Biost 590, Spr 2005 68 Lecture 1: 17

Comparisons Across Groups Comparisons across groups then use differences or ratios Difference / ratio of means (arithmetic, geometric, ) Difference / ratio of proportion exceeding some threshold Difference / ratio of medians (or other quantiles) Ratio of odds of exceeding some threshold Ratio of hazard (averaged across time?) Based on Type of Data Binary or dichotomous: mean (proportion); odds Nominal (unordered categories): frequencies; odds Ordinal (ordered categories): median (quantiles); odds;? mean Quantitative (addition makes sense): mean; median; proportion > c; hazards, Biost 590, Spr 2005 69 Biost 590, Spr 2005 70 Joint Summary Measures Other times groups are compared using a summary measure for the joint distribution Median difference / ratio of paired observations Probability that a randomly chosen measurement from one population might exceed that from the other Criteria for Summary Measure In order of importance Scientifically (clinically) relevant Also reflects current state of knowledge Is likely to vary across levels of the factor of interest Ability to detect variety of changes Statistical precision Only relevant if all other things are equal Biost 590, Spr 2005 71 Biost 590, Spr 2005 72 Lecture 1: 18

Probability Models Parametric E.g., Poisson, exponential, Weibull Semiparametric E.g., proportional hazards,? quasilikelihood Nonparametric E.g., Kolmogorov-Smirnov,? quasilikelihood Biost 590, Spr 2005 73 Statistical Models Dimension of response variable univariate, bivariate, multivariate Summary measure Relationship among observations Independent (uncorrelated) vs dependent (correlated) Number of groups One, two, K, infinite (regression) Biost 590, Spr 2005 74 Statistical Precision Choose sample size to ensure adequate precision to answer question Width of interval estimates (best) Statistical power Sample Size Calculation Choose sample size to ensure adequate precision to answer question Width of interval estimates (best) Statistical power Biost 590, Spr 2005 75 Biost 590, Spr 2005 76 Lecture 1: 19

Sample Size Calculation Sample size formulas used in wide variety of statistical settings n? n is the maximal sample size? 1 2?? V (???? aß is the alternative for which a standardized form of a level? test has power? 1 / V is the average statistical information contributed by each sampling unit 0 ) 2 Biost 590, Spr 2005 77 Logistical Constraints When sample size is constrained, either Determine power to detect a specified alternative Determine alternative detected with high power When using 95% CI, the alternative discriminated from the null is the alternative for which we have 97.5% power Biost 590, Spr 2005 78 Statistical Tasks Data Analysis Descriptive Statistics Description of a sample Identification of measurement or data entry errors Characterize materials and methods Validity of analysis methods Assess scientific and statistical assumptions (Straightforward estimates of effects-- inference) Hypothesis generation (inference-- estimation) Biost 590, Spr 2005 79 Biost 590, Spr 2005 80 Lecture 1: 20

Inference Generalizations from sample to population Estimation Point estimates Interval estimates Decision analysis (testing) Quantifying strength of evidence Inference: Point Estimation Identification of clusters Individual observations (predictions) Continuous measurements Categorical measurements (discrimination, classification) Summary measures of distributions Within a population Across populations Biost 590, Spr 2005 81 Biost 590, Spr 2005 82 Inference: Interval Estimates Quantifying uncertainty Individual observations Prediction intervals (continuous measurements) Accuracy (discrimination, classification) Summary measures Frequentist confidence intervals Bayesian credible intervals Inference: Decision Analysis Hypothesis testing Quantification of strength of evidence for a decision Frequentist p values Bayesian posterior probabilities Decision Biost 590, Spr 2005 83 Biost 590, Spr 2005 84 Lecture 1: 21

Approach to a Consulting Problem Biost 590, Spr 2005 85 Start at the Beginning Understand the scientific question Overall goal Current state of knowledge Specific aims of this experiment Scientific relevance of the experiment Why do we care? What will we do with the results If positive If negative Biost 590, Spr 2005 86 Can Statistics Help? Litmus Test # 1: Can Statistics Help? Litmus Test # 2: If the scientific question cannot be answered by an experiment when outcomes are entirely deterministic, there is NO chance that statistics can be of any help. If the scientific researcher cannot decide on an ordering of data distributions which would be appropriate when measurements are available on the entire population, there is NO chance that statistics can be of any help. Biost 590, Spr 2005 87 Biost 590, Spr 2005 88 Lecture 1: 22

Classify the Statistical Goal Type of question Prediction: point, interval Cluster analysis Factor analysis Quantifying distribution within group Comparing distributions across groups Level of detail Existence, direction, first order trends, precise dose response Biost 590, Spr 2005 89 Thought Experiment Think about ideal study design It is very useful to imagine what you would like to do (unconstrained by practicality or ethics) as a starting point This can then serve as a reference to help decide what can be done within the limitations of the actual problem Biost 590, Spr 2005 90 Where Are We in the Study Describe Sampling Methods Hypothesis generation (Interpretation of literature) Study design Statistical design Quality control Analysis of data Interpretation of results Source of data Location, time Selection criteria (inclusion, exclusion) Sample sizes specified by design Overall and within prespecified strata Sampling scheme may have specified number of smokers and nonsmokers in a cohort design Sampling scheme may reflect random process incidence of events is then estimable Response to reviewers Biost 590, Spr 2005 91 Biost 590, Spr 2005 92 Lecture 1: 23

Scientifically Classify Variables Demographic variables Measures of exposure Measures of concurrent disease Measures of severity of disease Measures of clinical outcomes etc. Biost 590, Spr 2005 93 Statistical Role of Variables Outcome (response) variable(s) Predictor(s) of interest (main grouping variable(s)) Subgroups of interest for effect modification Potential confounders Variables that add precision to analysis Often these are potential confounders, because they may be associated with predictor(s) of interest in sample Biost 590, Spr 2005 94 Descriptive Statistics Select Inferential Methods Tables, plots Identification of measurement or data entry errors Characterize materials and methods Available data Validity of analysis methods Assess scientific and statistical assumptions (Straightforward estimates of effects-- inference) Hypothesis generation (inference-- estimation) Biost 590, Spr 2005 95 Cluster analysis Factor analysis Prediction, quantifying, or comparing distributions Summary measures Comparison measures Probability models Statistical models, covariate adjustment Measures of accuracy and precision of analysis Biost 590, Spr 2005 96 Lecture 1: 24

Results of Analyses For each scientific question Point estimate Interval estimate Quantification of strength of evidence Interpretation Statistician as collaborator Contribution to manuscript (Introduction) Statistical methods Results Analysis Clarity of presentation Discussion Especially where statistical limitations play a role Biost 590, Spr 2005 97 Biost 590, Spr 2005 98 Lecture 1: 25