Supplementary Online Content

Similar documents
Supplementary Online Content

The Statistical Analysis of Failure Time Data

Low-Fat Dietary Pattern Intervention Trials for the Prevention of Breast and Other Cancers

This paper is available online at

Supplementary Appendix

The SAS SUBTYPE Macro

Supplementary Online Content

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies

Supplementary Online Content

Supplementary Online Content

Recreational physical activity and risk of triple negative breast cancer in the California Teachers Study

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Introduction to Survival Analysis Procedures (Chapter)

Supplementary Online Content

Fruit and vegetable consumption in adolescence and early adulthood and risk of breast cancer: population based cohort study

8/10/2012. Education level and diabetes risk: The EPIC-InterAct study AIM. Background. Case-cohort design. Int J Epidemiol 2012 (in press)

How to analyze correlated and longitudinal data?

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Supplementary Table 1. Association of rs with risk of obesity among participants in NHS and HPFS

Supplementary Appendix

Pinkal Desai MD MPH Weill Cornell Medical College New York, NY WHI Annual Meeting, May 5-6, 2016

NORTH SOUTH UNIVERSITY TUTORIAL 2

MODEL SELECTION STRATEGIES. Tony Panzarella

Spatiotemporal models for disease incidence data: a case study

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Supplementary Online Content

NIH Public Access Author Manuscript Br J Nutr. Author manuscript; available in PMC 2015 September 28.

Commentary SANDER GREENLAND, MS, DRPH

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Supplementary Online Content

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Basic Biostatistics. Chapter 1. Content

Supplementary Online Content

SUPPLEMENTAL MATERIAL

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Impact of BMI on pathologic complete response (pcr) following neo adjuvant chemotherapy (NAC) for locally advanced breast cancer

Mammographic density and risk of breast cancer by tumor characteristics: a casecontrol

BIOSTATISTICAL METHODS

Supplementary Online Content

SUPPLEMENTARY MATERIAL

Supplementary Online Content

New Insights into Breast Cancer Risk Reduction

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Unit 1 Exploring and Understanding Data

The SAS contrasttest Macro

Supplementary Online Content

Part 8 Logistic Regression

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Daniel Boduszek University of Huddersfield

Presenter Disclosure Information Karine Sahakyan, MD, PhD, MPH

Chapter 13 Estimating the Modified Odds Ratio

8/24/2011. Study Goal. Study Design. Patient Attributes Influencing Pain and Pain Management in Postoperative Total Knee Arthroplasty Patients

OPCS coding (Office of Population, Censuses and Surveys Classification of Surgical Operations and Procedures) (4th revision)

Package speff2trial. February 20, 2015

Supplementary Online Content

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Development, validation and application of risk prediction models

Appendix: Supplementary tables [posted as supplied by author]

Anthropometry: What Can We Measure & What Does It Mean?

COMMENTARY: DATA ANALYSIS METHODS AND THE RELIABILITY OF ANALYTIC EPIDEMIOLOGIC RESEARCH. Ross L. Prentice. Fred Hutchinson Cancer Research Center

Locoregional treatment Session Oral Abstract Presentation Saulo Brito Silva

CONDITIONAL REGRESSION MODELS TRANSIENT STATE SURVIVAL ANALYSIS

Supplementary Online Content

Novel Methods in the Visualization of Transitional Phenomena

BECAUSE OF THE BENEFIT OF

Biostatistics for Med Students. Lecture 1

Landmarking, immortal time bias and. Dynamic prediction

Summary. 20 May 2014 EMA/CHMP/SAWP/298348/2014 Procedure No.: EMEA/H/SAB/037/1/Q/2013/SME Product Development Scientific Support Department

Only Estrogen receptor positive is not enough to predict the prognosis of breast cancer

A Methodological Issue in the Analysis of Second-Primary Cancer Incidence in Long-Term Survivors of Childhood Cancers

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Genomic Profiling of Tumors and Loco-Regional Recurrence

Person-years; number of study participants (number of cases) HR (95% CI) P for trend

Biostatistics II

Multivariate dose-response meta-analysis: an update on glst

Web Extra material. Comparison of non-laboratory-based risk scores for predicting the occurrence of type 2

Dan Byrd UC Office of the President

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Continuous update of the WCRF-AICR report on diet and cancer. Protocol: Breast Cancer. Prepared by: Imperial College Team

IJC International Journal of Cancer

Supplementary Online Content

Math 215, Lab 7: 5/23/2007

Headache, migraine and risk of brain tumors in women: prospective cohort study

Insulin Resistance and Long-Term Cancer-Specific and All-Cause Mortality: The Women s Health Initiative (WHI)

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Ovarian cancer example: h(t rx, age) = h(t) exp(β1 rx + β2 age) > cph1 = coxph(surv(futime, fustat)~rx+age, ovarian) > summary(cph1)

Lecture 4. Confounding

Up-to-date survival estimates from prognostic models using temporal recalibration

CHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

SUPPLEMENTARY DATA. Supplementary Figure S1. Cohort definition flow chart.

IJC International Journal of Cancer

Assessment of Risk Recurrence: Adjuvant Online, OncotypeDx & Mammaprint

Chapter 14: More Powerful Statistical Methods

Deanna Schreiber-Gregory Henry M Jackson Foundation for the Advancement of Military Medicine. PharmaSUG 2016 Paper #SP07

James H. Liu, M.D. Arthur H. Bill Professor Chair of Reproductive Biology Dept of Obstetrics and Gynecology

Supplementary Online Content

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Supplementary Online Content

Transcription:

Supplementary Online Content Neuhouser ML, Aragaki AK, Prentice RL, et al. Overweight, obesity, and postmenopausal invasive breast cancer risk: a secondary analysis of the Women s Health Initiative randomized clinical trials [published online June 11, 2015]. JAMA Oncol. doi: 10.1001/jamaoncol.2015.1546. emethods efigure 1. Multivariable adjusted hazard ratios, compared to women of normal weight (BMI < 25 kg/m2), for total incident invasive breast cancer, breast cancer characteristics and other breast cancer outcomes in the WHI Clinical Trial efigure 2. Multivariable adjusted hazard ratios, compared to women of normal weight (BMI < 25 kg/m2), for total incident invasive breast cancer by select subgroups in the WHI Clinical Trial efigure 3. Multivariate associations of BMI with breast cancer risk (a) and weight with breast cancer risk (b)

emethods 1. Statistical methods and data structures for the competing risk analysis (Table 2 and Supplementary Figure 1) and weight change (Table 3). 1.1 Conceptual model Hazard ratio estimates and corresponding comparisons of trends associated with baseline BMI group by invasive breast cancer characteristics were computed via a competing risk analysis. The baseline hazard function may differ among characteristic types (e.g., ER/PR status), so the Cox proportional hazards models are stratified by characteristic type h k (t X,Z) = h k,0 (t) * exp(b k *X + C*Z) where h k,0 (t) is the baseline type-specific hazard for the k th type (e.g., ER+/PR+), X is baseline BMI group, B k the effect of BMI group for type k, Z represents a common set of covariates, and C are their corresponding coefficients that are assumed constant across characteristic type. 1.2 Data structure To make use of commonly available software (e.g., SAS, R), it is useful to create a dataset with an amenable structure. The dataset contains one stratum for each of the k characteristic types with all of participants appearing k times. For example, pptid start stop BMI event estrata 1 0 2.1 >=35 1 ER+PR+ 1 0 2.1 >=35 0 ER+PR- 1 0 2.1 >=35 0 ER-PR+ 1 0 2.1 >=35 0 ER-PR- 27 0 6.2 < 25 0 ER+PR+ 27 0 6.2 < 25 0 ER+PR- 27 0 6.2 < 25 0 ER-PR+ 27 0 6.2 < 25 1 ER-PR- these two participants experienced an invasive breast cancer at 2.1 and 6.2 years after entry into the WHI CT, where the first participant had an ER+PR+ cancer and the later had an ER-PR- cancer. 1.3 Estimation of HR(95%CI)

The model to estimate HRs for BMI group can be operationalized in a straightforward fashion with SAS s PROC PHREG. For example, PROC PHREG DATA=data; CLASS BMI(ref='< 25') estrata(ref=' ER+PR+') &catvar /ORDER=INTERNAL ; MODEL (start,stop)*event(0)= BMI BMI*eStrata &catvar &convar/ties= EXACT; STRATA &staticstratvar &TDstratvar estrata; ID pptid; HAZARDRATIO "BMI group HRs for ER+PR+" BMI / at (estrata='er+pr+') diff=ref; HAZARDRATIO "BMI group HRs for ER+PR-" BMI / at (estrata='er+pr-') diff=ref; HAZARDRATIO "not shown" BMI / at (estrata='er-pr+') diff=ref; HAZARDRATIO "BMI group HRs for ER-PR-" BMI / at (estrata='er-pr-') diff=ref; RUN; where estrata designates type of ER/PR status, &catvar is a list of categorical covariates (e.g., race/ethnicity), &convar is a list of continuous covariates (e.g., age), &staticstratvar is a list of the time-invariant strata (e.g., HT trial randomization group), &TDstratvar is a list of time-dendent strata (e.g., CaD trial randomization group). The salient features for the competing risk analysis have been bolded. As described, the baseline hazard is allowed to vary by characteristic type (SAS variable estrata), as well as accompanying HR estimates for BMI group by including an interaction term (SAS MODEL variable BMI * estrata). Since there is at most one invasive breast cancer event per person, the usual estimates of variance are valid, and use of a robust(sandwich) estimate is not necessary. 1.4 Test for heterogeneity of trend The model to test whether trends for the association between cancer risk and BMI group differ by characteristic type can operationalized with PROC PHREG DATA=data ; CLASS estrata(ref=' ER+PR+') &catvar / ORDER=INTERNAL; MODEL (start,stop)*event(0)= BMI BMI*eStrata &catvar &convar/ties= EXACT; STRATA &staticstratvar &TDstratvar estrata; ID pptid; CONTRAST 'Test of heterogeneity' BMI*eStrata 1 0 0, BMI*eStrata 0 0 1 / TEST (SCORE); RUN;

where BMI group was purposefully omitted from the SAS CLASS statement, so that BMI group would be represented by ordinal numbers (i.e., 1, 2, 3, 4) and fit as a linear continuous variable. Consequently, the test for heterogeneity of trend is a 2 degree-of-freedom test and tests the null hypothesis of B ER+PR+ = B ER+PR-, B ER+PR+ = B ER-PR-. Please note that SAS codes estrata as estrata ER+PR+ 0 0 0 ER+PR- 1 0 0 ER-PR+ 0 1 0 ER-PR- 0 0 1 and due to apriori concerns regarding the validity of the ER-PR+ classification, no inferences were made regarding this characteristic type. Similar methods may be used to perform competing risk analysis for the remaining breast cancer characteristics shown in Table 2. 1.5 Additional modeling methods In the above example, time-to-event is represented by the counting-process intervals (start, stop], rather than simply timeto-event. This representation is useful because it facilitates straightforward inclusion of time-dependent covariates and timedependent strata in our Cox proportional hazards model. For example, suppose we intend to account for randomization into the CaD trial by time-dependent strata, and the first participant was randomized, a year after entry, to the placebo arm of the CaD trial. Then the data structure of the first participant is represented by

pptid start stop BMI event estrata CaD 1 0 1.0 >=35 0 ER+PR+ NR 1 0 1.0 >=35 0 ER+PR- NR 1 0 1.0 >=35 0 ER-PR+ NR 1 0 1.0 >=35 0 ER-PR- NR 1 1.0 2.1 >=35 1 ER+PR+ Placebo 1 1.0 2.1 >=35 0 ER+PR- Placebo 1 1.0 2.1 >=35 0 ER-PR+ Placebo 1 1.0 2.1 >=35 0 ER-PR- Placebo and time-dependent strata can be modeled by including the variable CaD in the STRATA statement; NR stands for not randomized. For the analysis of weight change shown in Table 3, the dataset can be further expanded to include baseline weight and annual weight measurements as a time-dependent variable, which can be used to create a time-dependent weight change variable (wttd minus wt0). In this example, the participant had annual weight measurements of 92.8 and 93.2 that occurred 1.0 and 1.9 years after entry. pptid start stop BMI event estrata CaD wt0 wttd 1 0 1.0 >=35 0 ER+PR+ NR 98.1 98.1 1 0 1.0 >=35 0 ER+PR- NR 98.1 98.1 1 0 1.0 >=35 0 ER-PR+ NR 98.1 98.1 1 0 1.0 >=35 0 ER-PR- NR 98.1 98.1 1 1.0 1.9 >=35 0 ER+PR+ Placebo 98.1 92.8 1 1.0 1.9 >=35 0 ER+PR- Placebo 98.1 92.8 1 1.0 1.9 >=35 0 ER-PR+ Placebo 98.1 92.8 1 1.0 1.9 >=35 0 ER-PR- Placebo 98.1 92.8 1 1.9 2.1 >=35 1 ER+PR+ Placebo 98.1 93.2 1 1.9 2.1 >=35 0 ER+PR- Placebo 98.1 93.2 1 1.9 2.1 >=35 0 ER-PR+ Placebo 98.1 93.2 1 1.9 2.1 >=35 0 ER-PR- Placebo 98.1 93.2

For the analysis in Table 3, we do not differentiate risk by ER/PR status (or other characteristics) so the data structure can be collapsed over ER/PR status. Thus the first participant is represented by pptid start stop BMI event CaD wt0 wttd 1 0 1.0 >=35 0 NR 98.1 98.1 1 1.0 1.9 >=35 0 Placebo 98.1 92.8 1 1.9 2.1 >=35 1 Placebo 98.1 93.2 2. Statistical methods for the exploratory analysis of the association between invasive breast cancer risk and baseline BMI (Supplementary Figure 3). 2.1 Conceptual model In Cox proportional hazards regression, a continuous predictor variable, X, is often modeled as h(t X,Z) = h 0 (t) * exp(b*x). Therefore a linear relationship is assumed between the exposure variable of interest (e.g., baseline BMI) and the log hazard of the event (e.g., invasive breast cancer). To examine whether the functional relationship is well approximated by a linear fit (i.e., B*X), a non-parametric smooth is fit for comparison. The non-parametric component, S(X), can be directly estimated by local polynomials or splines in commonly available software such as R(SAS) with existing functions (procedures) like coxph (PROC PHREG); the function coxph can be found in the R-library survival. 2.2 Data structure Datasets described in section 1.2, that accommodate time-dependent covariates or strata, can also be used when nonparametric components are included. Since we are modeling time to first event, there is only one event per person, thus the usual estimates of variance may be used. 2.3 Implementation In R, S(X) can be directly modeled in a Cox regression model with using penalized smoothing splines; the R function is psplines( ). Specifically, fits <- coxph(surv(start,stop,event) ~ pspline(bmi), data = data)

where bmi is a continuous variable of BMI measures (e.g., 98.1, 72.5), start & stop are survival times in counting-process format (start, stop], and event is invasive breast cancer; covariates and strata can readily be included to the right hand side of ~. The default smooth uses 4 degrees-of-freedom, or the user may specify their desired smoothness. Smoothness can also be chosen objectively via Akaike s information criteria (AIC) by fits <- coxph(surv(start,stop,event) ~ pspline(bmi,df=0), data = data). The standard output from above, provides an approximate test of non-linearity. A more formal likelihood ratio test can be obtained by modeling BMI with a linear term, and fitting fitl <- coxph(surv(start,stop,event) ~ bmi, data = data) and then performing a likelihood ratio test between the two fitted models, fits and fitl. 3. References An invaluable reference for statistical practitioners is the wonderful book by Therneau and Grambsch (2000). Their book provides a comprehensive guide to survival analysis that includes counting processes, time-dependent covariates, timedependent strata, competing risk analyses, non-parametric regression, and beyond. Therneau, Terry M., and Patricia M. Grambsch. Modeling survival data: extending the Cox model. Springer Science & Business Media, 2000. SAS PROC PHREG, SAS version 9.3 (Cary, NC) Therneau T (2014). _A Package for Survival Analysis in S_. R package version 2.37-7, <URL: http://cran.r-project.org/package=survival>.

efigure 1

efigure 2

efigure Legends efigure 1. Multivariable adjusted hazard ratios, compared to women of normal weight (BMI < 25 kg/m 2 ), for total incident invasive breast cancer, breast cancer characteristics and other breast cancer outcomes in the WHI Clinical Trial. Vertical reference lines correspond to the estimated hazard ratio for the main effect of each BMI group in the primary analysis. Complete summary statistics can be found in Table 2 of the manuscript. Analyses were adjusted for age, race/ethnicity, education, parity, age at first birth, bilateral oophorectomy, family history of breast cancer, E-alone use and duration, E+P use and duration, smoking status, diabetes, alcohol consumption, and stratified by baseline age group, HT trial randomization group, dietary trial randomization group, hysterectomy status, CaD trial randomization group (time-dependent) and extended follow-up (time-dependent). The p-value corresponds to either a trend test for the main effect of BMI on invasive breast cancer or other breast cancer endpoints, or a test of heterogeneity for trends between BMI and invasive breast cancer subtypes. efigure 2. Multivariable adjusted hazard ratios, compared to women of normal weight (BMI < 25 kg/m 2 ), for total incident invasive breast cancer by select subgroups in the WHI Clinical Trial. Vertical reference lines correspond to the estimated hazard ratio for the main effect of each BMI group in the primary analysis. Complete summary statistics can be found in Table 4 of the manuscript. Covariate adjustments were similar to the primary analysis described above. P-values correspond to the statistical significance of a 1-df test of trend for the main effect of BMI on invasive breast cancer risk in the WHI CT, or by cohorts defined by baseline hysterectomy status. The remaining p-values correspond to tests of interaction for subgroups within their respective cohorts. Women with uterus includes only participants that reported not having had a hysterectomy and were randomized to any WHI clinical trial (n= 38981). Women without uterus includes only participants that reported having had a hysterectomy and were randomized to any WHI clinical trial (n= 28158). efigures 3a and 3b. Multivariate associations of BMI with breast cancer risk (a) and weight with breast cancer risk (b). These are nonparametric fits(splines) and the smoothing parameter as chosen via Akaike information criteria (AIC), The smooth of weight also included height as a covariate.