ACHIEVING GOOD POWER WITH CLUSTERED AND MULTILEVEL DATA Keith E. Muller Department of Health Outcomes and Policy University of Florida,

Similar documents
Selecting a Valid Sample Size for Longitudinal and Multilevel Studies in Oral Behavioral Health

Understanding the Hypothesis

MULTIVARIATE, MULTIVARIABLEMODELS FOR DEFENSIBLE INFERENCE ABOUT SHAPE

Statistical Primer for Cardiovascular Research

f WILEY ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Staffordshire, United Kingdom Keele University School of Psychology

Inferential Statistics

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

That's a nice elliptical plot, for which the the "ordinary" Pearson r correlation is.609.

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

How to analyze correlated and longitudinal data?

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

investigate. educate. inform.

Biostatistics for Med Students. Lecture 1

Certificate Courses in Biostatistics

Interaction Effects: Centering, Variance Inflation Factor, and Interpretation Issues

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Experimental Studies. Statistical techniques for Experimental Data. Experimental Designs can be grouped. Experimental Designs can be grouped

Journal of Biostatistics and Epidemiology

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Section on Survey Research Methods JSM 2009

Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes

On the purpose of testing:

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

11/24/2017. Do not imply a cause-and-effect relationship

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

OUTCOMES OF DICHOTOMIZING A CONTINUOUS VARIABLE IN THE PSYCHIATRIC EARLY READMISSION PREDICTION MODEL. Ng CG

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Longitudinal and Hierarchical Analytic Strategies for OAI Data

The University of North Carolina at Chapel Hill School of Social Work

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

showcase the utility of models designed to incorporate zeros from multiple generating processes. We will examine predictors of absences as a vehicle

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Small-area estimation of mental illness prevalence for schools

Basic Biostatistics. Chapter 1. Content

What are Indexes and Scales

Business Statistics Probability

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

VARIABLES AND MEASUREMENT

Reveal Relationships in Categorical Data

Methods for Computing Missing Item Response in Psychometric Scale Construction

STAT 503X Case Study 1: Restaurant Tipping

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Centering Predictors

PRINCIPLES OF STATISTICS

Class 7 Everything is Related

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Daniel Boduszek University of Huddersfield

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Ecological Statistics

How to interpret scientific & statistical graphs

Reliability of Ordination Analyses

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Use of the Quantitative-Methods Approach in Scientific Inquiry. Du Feng, Ph.D. Professor School of Nursing University of Nevada, Las Vegas

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Generalized Estimating Equations for Depression Dose Regimes

RESEARCH METHODS. A Process of Inquiry. tm HarperCollinsPublishers ANTHONY M. GRAZIANO MICHAEL L RAULIN

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Technical Track Session IV Instrumental Variables

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

Multivariable Systems. Lawrence Hubert. July 31, 2011

Module 14: Missing Data Concepts

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Before we get started:

How many speakers? How many tokens?:

Hypothesis Testing. Richard S. Balkin, Ph.D., LPC-S, NCC

Chapter Eight: Multivariate Analysis

Choosing an Approach for a Quantitative Dissertation: Strategies for Various Variable Types

Multiple Regression Models

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

The Nature of Regression Analysis

Fundamentals of Psychophysics

Chapter 14: More Powerful Statistical Methods

How to Model Mediating and Moderating Effects. Hsueh-Sheng Wu CFDR Workshop Series November 3, 2014

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Use of GEEs in STATA

TACKLING SIMPSON'S PARADOX IN BIG DATA USING CLASSIFICATION & REGRESSION TREES

Study Guide #2: MULTIPLE REGRESSION in education

IR 211 Professor Graham Fall 2015 Final Study Guide, Complete

Chapter 1. Introduction

Midterm project due next Wednesday at 2 PM

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Demonstrating multilevel structural equation modeling for testing mediation: Effects of self-critical perfectionism on daily affect

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Transcription:

ACHIEVING GOOD POWER WITH CLUSTERED AND MULTILEVEL DATA Keith E. Muller Department of Health Outcomes and Policy University of Florida, KMuller@ufl.edu www.health-outcomes-policy.ufl.edu/muller Supported in part by NIAAA R01-AA013458-01, NHLBI R01-HL091005, NIDCR U54-DE019261, and NIDCR R01-DE020832-01A1 Disclosures: Muller has been a paid consultant to SAS; his wife worked for SAS for 5 years. 1

1. Recognizing the Challenge 2. Achieving Good Power 3. Overview of Using GLIMMPSE Software Bibliography 2

1. Recognizing the Challenge 1.1 Motivation Cluster and multilevel sampling can help: More sensitivity (power) Lower costs (recruit fewer participants) Cluster and multilevel sampling can hinder: Improper analysis usually underestimates variance and inflates type I error rate. Need to know when and how to use to gain the advantages and avoid the problems. 3

1.2 Definitions All of Statistics: estimation and inference (testing) Intervention study: modeling relationships (estimation) and testing hypotheses Response œ0(predictor) With random assignment to treatment, Dependent œ0(independent) Statistical model: Response œ0(predictor) error 4

Statistical model: response œ0(predictor) error Responses: ] variables Predictors: \ variables Errors: I ]œ0\ I Error Variance: job security for statisticians" 5

1.3 Diagnostic Questions Question 1: How Many Variables? # responses # predictors Model 1 1 univariate 1 many multi-variable many 1 multivariate many many multivariate 6

Same ], many times: Multivariate, and also Repeated Measures (REPM) Measures between sampling units (person, machine, hospital...) are independent. Measures within (repeated measures: Time," limb, organ,...) are not independent (correlated). Many distinct ] 's, many times: doubly multivariate, collections of repeated measures 7

Independent observations versus Multivariate (many ] 's) Repeated Measures Correlated observations Non-independent observations Emphasize: statistics different! Improper analysis can severely bias results Theory same for multivariate and REPM while interpretation and analysis differ. 8

Question 2: Variable Types? Scale Data (Error Distribution) Nominal: Dichotomous Polychotomous Ordered Categorical Ordinal Order with infinitely many values. Interval Gaussian Continuous Ratio Non-Gaussian Inaccuracy in small sample inference for most methods. 9

Question 3: Sampling Pattern? Design #Times # Indep. Timing Sampling Units Cross Sectional one N -- Everything else: not cross sectional, multivariate in some sense 10

Multivariate Commensurate Repeated Measures Figure 1. Categories of Multivariate Data 11

All variables measured in same units, commensurate. What is the independent sampling unit, in contrast to the observational unit? # observations œ # ISU # Times Rœ# independent sampling units (ISU) : 3 œ # observations for independent sampling unit 3 8œtotal # observations œ : 3œ" In a cross sectional design because 8œR : " 3 R 3 12

Table 1. Repeated Measures Sampling Patterns Type # Obs #ISU Timing per ISU REPM " R consistent Crossover : R alternating Longitudinal : 3, varies R inconsistent Time series 8 1 regular Cluster : 3 # Clust exchangeable Survey : # Clust exchangeable 3 Split Plot terminology originated in agriculture 13

Hierarchal, or multilevel data have "nested" sampling, with combinations of dimensions. Examples of two or more dimensions of sampling: teeth within a person teeth within a person within a clinic teeth within a person within a clinic measured repeatedly over time 14

1's? 10's? 100's? 1000's? more? Question 4: # Independent Sampling Units per Response Variable? High Dimension, Low Sample Size (HDLSS): Examples of more observations than people: genomics, transcripomics, medical imaging 15

Question 1: How Many Variables? Question 2: Variable Types? Question 3: Sampling Scheme? Question 4: # ISUs per Response? Answers to questions allow recognizing the challenge. Caution: Usage of some definitions varies widely! 16

1.4 Analysis Methods Choosing a Technique Accurate estimation (means, proportions)? Defensible inference (type I error rate)? Property 1: How Many Variables? Property 2: Variable Types? Property 3: Sampling Scheme? Property 4: # Persons per Response? 17

5 6 8 5 œ 1200 answers to 4 questions. Each a potentially distinct analysis. Answer requires a career, not a lecture! Every analysis an approximation. Seek accurate estimation and defensible inference (tests). Answers changing every few years due to advances in computing. Being a statistician means never having to say you're certain." (ASA T-shirt) 18

Shop smart, learn limitations. Report any limitations in publications. Accurate small sample tests not available for many interesting REPM models, even assuming Gaussian errors. Worst problem for non-gaussian data. Availability of software to fit a model, even in major packages, does not guarantee method defensible for small N studies. 19

1. Recognizing The Challenge r 1.1 Motivation: Special Handling r 1.2 Definitions r 1.3 Diagnosis: Answer 4 Questions r 1.4 Analysis Methods 2. Achieving Good Power 2.1 Use the Highest Scale Possible 2.2 Align Power Analysis with Data Analysis 2.3 Conduct Sensitivity Analysis 3. Overview of Using GLIMMPSE 20

2. Achieving Good Power 2.1 Use the Highest Scale Possible Nominal, ordinal, (interval, ratio). RC MacCallum, S Zhang, KJ Preacher, and DD Rucker (2002) On the Practice of Dichotomization of Quantitative Variables and subsequent related papers attempting to fight back. Clinical thinking encourages (I have seen opposite position) 21

2.2 Align Power Analysis with Data Analysis Method must match method; hypothesis must match hypothesis. Muller, LaVange, Ramey and Ramey (1992) examples of t test power being wrong for repeated measures: low example (child growth), high example (kidney disease). Test what must be tested, but may not have high power. Seek high power for what you predict. 22

2.3 Conduct Sensitivity Analysis Change dimensions, and ratios of dimensions. How many people, but also repeated measures, clusters, blocks. Use smart spacing. a) log2(dose), i.e. doublings: 2, 4, 8 mg/kg, weeks, etc. b) Use only Min and Max for early science. Omit middle doses. Create power curves. Try different variances for continuous data. Try difference event rates for binary outcome. 23

Use confidence intervals for power values based on estimates (Taylor and Muller, 1995). Available in GLIMMPSE (? now or soon). 24

1. Recognizing The Challenge r1.1 r1.2 r1.3 r1.4 2. Achieving Good Power r2.1 Use the Highest Scale Possible r2.2 Align Power Analysis with Data Analysis r2.3 Conduct Sensitivity Analysis 3. Overview of Using GLIMMPSE From APA presentation: slides 70-125 25

BIBLIOGRAPHY Most articles listed can be downloaded or linked from www.health-outcomes-policy.ufl.edu/muller Kleinbaum DG, Kupper LL, Nizam A, and Muller KE (2007) Applied Regression Analysis and Other Multivariable Methods. 4th edition. Boston: Duxbury Press. Muller KE and Stewart PW (2006) Linear Model Theory; Univariate, Multivariate, and Mixed Models. New York: Wiley. Muller KE and Fetterman BA (2002) Regression and ANOVA: An Integrated Approach Using SAS Software. Cary, NC: SAS Institute. Cheng J, Edwards LJ, Maldonado-Molina MM, Komro KA, and Muller KE (2010) Real longitudinal data analysis for real people: building a good enough mixed model, Statistics in Medicine, 29, 504-520.87 Edwards LJ, Muller KE, Wolfinger RD, Qaqish BF, and Schabenberger O (2008) An R-square statistic for fixed effects in the linear mixed model, Statistics in Medicine, 27, 6137-6157. Gurka, MJ, Muller KE, and Edwards LJ (2011) Avoiding bias in mixed model inference for fixed effects, Statistics in Medicine, in press. Muller KE (2009) Analysis of Variance (ANOVA concepts and computations), Wiley Interdisciplinary Reviews: Computational Statistics. Muller KE, Edwards LJ, Simpson SL, and Taylor DJ (2007) Statistical tests with accurate size and power for balanced linear mixed models, Statistics in Medicine, 26, 3639 3660. DOI 10.1002/sim.2827 Simpson SL, Edwards LJ, Muller KE, Sen PK, and Styner MA (2010) A linear exponent AR(1) family of correlation structures, Statistics in Medicine, 29, 1825 1838. Gurka MJ, Edwards LJ, Muller KE, and Kupper LL (2006) Extending the Box-Cox transformation to the linear mixed model, Journal of the Royal Statistical Society Series A, Statistics in Society, 169, 255-272.