Biost 590: Statistical Consulting

Similar documents
Lecture Outline Biost 517 Applied Biostatistics I

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

A Case Study: Two-sample categorical data

Biost 524: Design of Medical Studies

Fundamental Clinical Trial Design

From single studies to an EBM based assessment some central issues

MS&E 226: Small Data

Biost 517: Applied Biostatistics I Emerson, Fall Homework #2 Key October 10, 2012

Hierarchy of Statistical Goals

Module Overview. What is a Marker? Part 1 Overview

investigate. educate. inform.

Advanced IPD meta-analysis methods for observational studies

BAYESIAN JOINT LONGITUDINAL-DISCRETE TIME SURVIVAL MODELS: EVALUATING BIOPSY PROTOCOLS IN ACTIVE-SURVEILLANCE STUDIES

ISPOR Task Force Report: ITC & NMA Study Questionnaire

Frailty Ascertainment: Beginning of the pathway to treatment

Bayesian and Frequentist Approaches

Basic Biostatistics. Chapter 1. Content

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

How to interpret scientific & statistical graphs

Statistics and Epidemiology Practice Questions

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models

Biostatistics for Med Students. Lecture 1

Supplementary notes for lecture 8: Computational modeling of cognitive development

Assessment of glomerular filtration rate in healthy subjects and normoalbuminuric diabetic patients: validity of a new (MDRD) prediction equation

Ordinal Data Modeling

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

Predicting Breast Cancer Survival Using Treatment and Patient Factors

1. The Role of Sample Survey Design

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

BIOSTATISTICAL METHODS

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Measurement and meaningfulness in Decision Modeling

PREDICTION OF PROSTATE CANCER PROGRESSION

Biostats Final Project Fall 2002 Dr. Chang Claire Pothier, Michael O'Connor, Carrie Longano, Jodi Zimmerman - CSU

Computer Models for Medical Diagnosis and Prognostication

Online Supplementary Material

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Evidence Based Medicine

AN EMPIRICAL EVALUATION OF PERIOD SURVIVAL ANALYSIS USING DATA FROM THE CANADIAN CANCER REGISTRY. Larry F. Ellison MSc, Statistics Canada

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Template 1 for summarising studies addressing prognostic questions

On Bridging the Theory and Measurement of Frailty

A Brief Introduction to Bayesian Statistics

Assignment 4: True or Quasi-Experiment

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

CONSORT 2010 checklist of information to include when reporting a randomised trial*

CAN WE PREDICT SURGERY FOR SCIATICA?

Sensitivity, specicity, ROC

Analyzing diastolic and systolic blood pressure individually or jointly?

Statistical Issues in Road Safety Part I: Uncertainty, Variability, Sampling

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Chapter 17 Sensitivity Analysis and Model Validation

Aetiology versus Prediction - correct for Confounding? Friedo Dekker ERA-EDTA Registry / LUMC

Psychology Research Process

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

THE IMPORTANCE OF COMORBIDITY TO CANCER CARE AND STATISTICS AMERICAN CANCER SOCIETY PRESENTATION COPYRIGHT NOTICE

PTHP 7101 Research 1 Chapter Assignments

Zhao Y Y et al. Ann Intern Med 2012;156:

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

SERUM CYSTATIN C CONCENTRATION IS A POWERFUL PROGNOSTIC INDICATOR IN PATIENTS WITH CIRRHOTIC ASCITES

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Still important ideas

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Variable Data univariate data set bivariate data set multivariate data set categorical qualitative numerical quantitative

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2008 Formula Grant

Aspirin Resistance and Its Implications in Clinical Practice

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

Table of Contents. EHS EXERCISE 1: Risk Assessment: A Case Study of an Investigation of a Tuberculosis (TB) Outbreak in a Health Care Setting

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

The Concept of Validity

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Edinburgh Imaging Academy online distance learning courses. Statistics

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Basic Biostatistics. Dr. Kiran Chaudhary Dr. Mina Chandra

9 research designs likely for PSYC 2100

Expanding the Toolkit: The Potential for Bayesian Methods in Education Research (Symposium)

Brown University Department of Emergency Medicine Providence, RI

Clustering Autism Cases on Social Functioning

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Still important ideas

The Health Score: A summary

Understanding Statistics for Research Staff!

Chapter 14: More Powerful Statistical Methods

Cystatin C: A New Approach to Improve Medication Dosing

The degree to which a measure is free from error. (See page 65) Accuracy

Active Surveillance (AS) is an expectant management. Health Services Research

Bayesian Tolerance Intervals for Sparse Data Margin Assessment

Dipstick Urinalysis as a Test for Microhematuria and Occult Bladder Cancer

Fixed Effect Combining

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

= (6000 ml air / min * 0.04 ml CO 2 / ml air) / 54 ml CO 2 / dl plasma

Does Machine Learning. In a Learning Health System?

Cancer Incidence Predictions (Finnish Experience)

Psychology Research Process

Discrimination and Reclassification in Statistics and Study Design AACC/ASN 30 th Beckman Conference

Transcription:

Biost 590: Statistical Consulting Statistical Classification of Scientific Questions October 3, 2008 Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington 2000, Scott S. Emerson, M.D., Ph.D. 1

Lecture Outline: Today Scientific Method and Statistics Types of Statistical Questions Clustering cases Clustering variables Quantification of distributions Detecting associations Prediction 2

Lecture Outline: Next Week Statistical Tasks: Refining Hypotheses Study design Study structure Statistical analysis plan Sample size considerations 3

Scientific Method and Statistics 4

Stages of Scientific Studies Observation Hypothesis generation Confirmatory studies Disadvantages: Confounding Limited ability to establish cause and effect 5

Stages of Scientific Studies Experiment Intervention Elements of experiment Overall goal Specific aims (hypotheses) Materials and methods Collection of data Analysis Interpretation; Refinement of hypotheses 6

Scientific Method A well designed study discriminates between hypotheses The hypotheses should be the most important, viable hypotheses All other things being equal, it should be equally informative for all possible outcomes But may need to consider simplicity of experiments, time, cost 7

Addressing Variability Outcome measures rarely constant Inherent randomness Hidden (unmeasured) variables 8

Probability Models Probability models are used as a basis for variability Distribution of measurements Summary measure for scientific tendency (Signal and noise) 9

Role of Statistics Statistics is used to Describe tendencies of response Quantify uncertainty in conclusions Statistics means never having to say you are certain (ASA sweatshirt) 10

Statistical Input at All Stages Understand overall goal Refine specific aims Statistical statement of any hypotheses Materials and methods: Study design Collection of data: Advise on QC Analysis Describe sample (materials and methods) Analyses to address specific aims Interpretation 11

Classification: Scientific Goals Classification of studies by goals Focus on sample Description Focus on population Estimation Generation of scientific hypotheses Testing of scientific hypotheses 12

Classification: Statistical Goals Classification of studies that can be addressed statistically Clustering of observations Clustering of variables Quantification of distributions Comparing distributions Prediction of individual observations 13

Statistical Classification of Scientific Questions 14

1. Cluster Analysis Focus is on identifying similar groups of observations Divide a population into subgroups based on patterns of similar measurements Univariate, multivariate Known or unknown number of clusters (All variables treated symmetrically: No delineation between outcomes and groups) 15

Example: Cluster Analysis Potential for different causes for the same clinical syndrome: Glucose in urine Identify patterns of measurements that separate subpopulations of patients with diabetes Age of onset Symptoms at onset (e.g., weight) Auto-antibodies Characteristics of epidemics 16

Example: Cluster Analysis Statistical Tasks: Training sample Measure age, change in weight, auto-antibodies, etc. Statistical analysis Cluster analysis Summarize variable distributions within identified clusters (Attach labels?) 17

2. Clustering Variables Identifying hidden variables indicating groups that tend to have similar measurements of some outcome Interest in some particular outcome measurement Predictors that imprecisely measure some abstract quality Desire to find patterns in predictors that more precisely reflect the abstract quality 18

Example: Factor Analysis Identifying barriers to patient compliance in clinical trials In the Health Behavior Questionaire, multiple variables might be used to measure Self-perceived health; social support; depression Desire is to Find subset of questions that would suffice Identify hidden variables that affect compliance 19

Example: Factor Analysis Statistical Tasks: Training sample Measure response to questionnaire Statistical analysis Factor analysis (principal components) Report contribution to factors, factor loadings (Attach labels?) (Draw conclusions about importance of latent variables?) 20

Example: Genomics/Proteomics Combination of clustering cases and variables Measure expression of 10,000 genes on (usually small) number of patients Identify genes that tend to act the same way across patients Pathways? Identify groups of patients that tend to have the same patterns of gene expression Subtypes of disease? 21

3. Quantifying Distributions Focus is on distributions of measurements within a population Scientific questions about tendencies for specific measurements within a population Point estimates of summary measures Interval estimates of summary measures Quantifying uncertainty Decisions about hypothesized values 22

Example: Estimate Proportions Proportion of women among patients with primary biliary cirrhosis Serious liver disease often leading to liver failure Unknown etiology Characterizing types of people who suffer from disease may provide clues about causes (About 90% of patients with PBC are women) 23

Example: Estimate Proportions Statistical Tasks Sample of patients (from registry?) Measure demographics, etc. Statistical analysis Best estimate of the proportion Quantify uncertainty in that estimate Compare to the known proportion of women in the general population (approximately 50%)? 24

Example: Estimation of Median Median life expectancy of patients newly diagnosed with stage II breast cancer Want to know prognosis Judging public health risks Patients planning (?really prediction) 25

Example: Estimation of Median Statistical Tasks Sample of patients newly diagnosed with stage II breast cancer Follow for survival time (may be censored) Statistical analysis Best estimate of the median survival (K-M?) Quantify uncertainty in that estimate Compare to some clinically important time range (e.g., 10 years) 26

4. Comparing Distributions Comparing distributions of measurements across populations 4a. Identifying groups that have different distributions of some measurement 4b. Quantifying differences in the distribution of some measurement across predefined groups (effects or associations) 4c. Quantifying differences in effects across subgroups (interactions or effect modification) 27

4a. Identifying Groups Identifying groups that have different distributions of some measurement Focus is on some particular outcome measurement Identify groups based on other measurements E.g., quantifying distributions within subgroups E.g, stepwise regression models (cf: Cluster analysis where all measurements are treated symmetrically) 28

Example: Identifying Groups Chromosomal abnormalities associated with ovarian cancer Cytogenetic analysis of dividing cells identifies regions of the chromosomes with defects Cancer is caused by some defects, and cancer causes other defects Approximately 370 identifiable regions Which of the regions are the most promising to explore in more focused studies? 29

Example: Identifying Groups Statistical Tasks: Sample of cancer tissues Measure type of cancer (ovarian, melanoma, etc.) Measure chromosomal defects Statistical analysis Stepwise regression models of chromosomal abnormalities predicting cancer type (Use p values to rank interest in particular regions?) 30

Example: Identifying Groups Risk factors for diabetes Variables most associated with diabetes risk may give clues about etiology and eventual prevention 31

Example: Identifying Groups Statistical Tasks Sample subjects to measure risk factors and disease prevalence Cohort study Case-control study Statistical analysis Stepwise model building (Rank most interesting variables by p value?) 32

4b. Detecting Associations Associations between variables distributions of one variable differ across groups defined by another Existence of differences Direction of tendency of effect First, second order relationships in a summary measure Characterization of dose-response in a summary measure 33

Definition of an Association The distributions of two variables are not independent Independence: Equivalent definitions Probability of outcome and exposure is product of Overall probability of outcome, and Overall probability of exposure Distribution of exposure is EXACTLY the same across ALL outcome categories Distribution of outcome is EXACTLY the same across ALL exposure categories 34

Summary Measures Generally we consider some summary measure of the distribution For instance, when we use the mean, we show an association by showing either Mean outcome differs across exposure groups Mean exposure differs across outcome groups 35

Justification This works, because if two distributions are the same, ALL summary measures should be the same If some summary measure is different, then we know the distributions are different HOWEVER: This means that it is easier to prove an association, than to prove no association 36

Example: Detecting Association Effect of blood cholesterol levels on risk of heart attacks Understanding etiology of heart attacks may lead to prevention and/or treatment strategies 37

Example: Detecting Association Statistical tasks Measure risk factors, MIs on sample Cohort or case-control sample Statistical analyiss Regression model (possibly adjusted) Cohort: Incidence of MIs across cholesterol levels Case-control: Cholesterol levels across MI status (Comparison can be at many levels of detail) Quantify estimates, precision, confidence in decisions 38

4c. Detecting Effect Modification Quantifying differences in effects across subgroups (interactions or effect modification) Existence of interaction Direction of interaction (synergy, antagonism) Quantification of exact relationship of interaction 39

Example: Effect Modification Identifying whether effect of cholesterol on heart attacks differs by sex Comparing association between blood cholesterol level and incidence of heart attacks between sexes Quantify association in men Quantify association in women Compare measures of association 40

Approach Common to #3 & #4 In answering each scientific question, statistics typically provides four numbers Best estimate Best can be defined by frequentist or Bayesian criteria Interval describing precision Confidence interval or Bayesian credible interval Quantification of belief in some hypothesis P value or Bayesian posterior probability 41

Example: Detecting Association Association between sex and prevalence of MI in elderly population 59 of 366 males have had MI: 16.1% 32 of 367 females have had MI: 8.7% Association measured by difference Best estimate: Prevalence 7.4% higher in males Interval estimate: Between 2.7% and 12.2% (95% confidence interval) Strength of evidence: P value = 0.002 If there were no real difference, the observed data is pretty unlikely: Probability of this data is 0.002 42

5. Prediction Focus is on individual measurements Point prediction: Best single estimate for the measurement that would be obtained on a future individual Continuous measurements Binary measurements (discrimination) Interval prediction: Range of measurements that might reasonably be observed for a future individual 43

Example: Continuous Prediction Creatinine clearance Creatinine Breakdown product of creatine Removed by the kidneys by filtration Little secretion, reabsorption Measure of renal function Amount of creatinine cleared by the kidneys in 24 hours 44

Example: Continuous Prediction Problem: Need to collect urine output (and blood creatinine) for 24 hours Goal: Find blood, urine measures that can be obtained instantly, yet still provide an accurate estimate of a patient s creatinine clearance 45

Example: Continuous Prediction Statistical Tasks: Training sample Measure true creatinine clearance Measure sex, age, weight, height, creatinine Statistical analysis Regression model that uses other variables to predict creatinine clearance Quantify accuracy of predictive model (Mean squared error?) 46

Example: Discrimination Diagnosis of prostate cancer Use other measurements to predict whether a particular patient might have prostate cancer Demographic: Age, race, (sex) Clinical: Symptoms Biological: Prostate specific antigen (PSA) Goal is a diagnosis for each patient 47

Example: Discrimination Statistical Tasks: Training sample Gold standard diagnosis Measure age, race, PSA Statistical analysis Regression model that uses other variables to predict prostate cancer diagnosis Quantify accuracy of predictive model (ROC curve analysis?) 48

Example: Interval Prediction Determining normal range for PSA Identify the range of PSA values that would be expected in the 95% most typical healthy males Age, race specific values 49

Example: Interval Prediction Statistical Tasks: Training sample Measure age, race, PSA Statistical analysis Regression model that uses other variables to define prediction interval (Mean plus/minus 2 SD?) (Confidence interval for quantiles?) Quantify accuracy of predictive model (Coverage probabilities?) 50

Comment About Prediction For me to consider a problem to be purely a prediction problem, interest must lie solely in the predicted value, and not in the way that value was obtained E.g., in weather prediction, we might just want to know the weather tomorrow We won t be trying to impress upon our audience the way it should be predicted I do not think this is very often the case 51

Lecture Outline: Next Week Statistical Tasks: Refining Hypotheses Study design Study structure Statistical analysis plan Sample size considerations 52