Friday, September 9, :00-11:00 am Warwick Evans Conference Room, Building D Refreshments will be provided at 9:45am

Size: px

Start display at page:

Download "Friday, September 9, :00-11:00 am Warwick Evans Conference Room, Building D Refreshments will be provided at 9:45am"

Dominic Arnold
5 years ago
Views:

1 The Role of the Biostatistician in Cancer Research Edmund A. Gehan, PhD Professor Emeritus, Department of Biostatistics, Bioinformatics and Biomathematics Lombardi Comprehensive Cancer Center Georgetown University Medical Center in Washington, DC This paper considers triumphs and challenges for biostatisticians working in oncology at the beginning of the 21st century. The impact of three major articles in biostatistics in the 20 th century is considered. Cornfield s 1951 paper on estimating comparative rates from clinical data; Mantel and Haenszel s 1959 paper on obtaining summary measures of relative risk, adjusting for stratification factors in epidemiological studies; and D.R.Cox s 1972 paper, which developed the proportional hazards model for evaluating the effect of covariates on survival time outcomes. Biostatistical challenges for the 21 st century are considered for the areas of clinical trials, survival analysis, and statistical genetics. Friday, September 9, :00-11:00 am

2 Statistical Methods for Surrogate Marker Validation from Randomized Clinical Trials Abel T. Eshete, PhD Research Associate Harvard School of Public Health Harvard University Surrogate endpoints come into play in a number of contexts in place of the endpoint of interest, referred commonly to as the true or main endpoint. The use of surrogate endpoints is potentially beneficial, when these endpoints can be measured earlier, leading to a rapid approval of experimental drugs, or can be administered conveniently, which can be equated to less burden on the side of both the experimenter and the patients. In recent times, the evaluation exercise is framed within a meta-analytic setting, thereby overcoming difficulties that necessarily surround evaluation efforts based on a single trial. This approach however is computationally intensive and also requires different hierarchical models for different types of outcomes. To circumvent this problem, an information-theoretic approach which is simple to apply and can be used for a variety of outcome combinations has been introduced. A particular but important instance is where the true endpoint is the ultimate assessment in a sequence of repeated measures. It is then appealing to consider earlier measures, either in isolation or several combined, as a potential surrogate endpoint. The length and cost reducing potential has to be weighed carefully against loss in precision and the risks of an inappropriate decision regarding a new compound s fate. In this talk, we will discuss statistical methods used to validate surrogate endpoints of various types, within the meta-analytic framework. Friday, September 23, :00-11:00 am

3 Flexible Random Effects Copula Models for Clustered Mixed Bivariate Outcomes: Application in Developmental Toxicology Alexander de Leon, PhD Associate Professor Department of Mathematics & Statistics University of Calgary The talk concerns the analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a discrete and a continuous outcome. A copulabased random effects model is proposed that accounts for associations between discrete and/or continuous outcomes within and between clusters, including the intrinsic association between the mixed outcomes for the same subject. The approach yields regression parameters in models for both outcomes that are marginally meaningful; in addition, by assuming a latent variable framework to describe discrete outcomes, complications that arise from direct applications of copulas to discrete variables are avoided. Maximum likelihood estimation of the model parameters is implemented using readily available software (e.g., PROC NLMIXED in SAS), and results of simulations concerning the bias and efficiency of the estimates are reported. The proposed methodology is motivated by and illustrated using a developmental toxicity study of ethylene glycol (EG) in mice. Friday, October 7, :15-11:15 am Martin Marietta Conference Room Lombardi Comprehensive Cancer Center Refreshments will be provided at 10:00 am

Potential Impact of Population Structure on Population-Based Case-Control Association Studies Zhaohai Li, PhD Professor of Statistics and Biostatistics Department of Statistics The George Washington

4 Potential Impact of Population Structure on Population-Based Case-Control Association Studies Zhaohai Li, PhD Professor of Statistics and Biostatistics Department of Statistics The George Washington University Case-control association studies using unrelated cases and controls may suffer from potential confounding due to population stratification. Bias and variance distortion caused by population stratification in the commonly used allele-based tests can considerably inflate the Type I error rate. It is shown that the bias vanishes in the absence of disease rate heterogeneity. If only population stratification exists, a proper estimate of the variance of the allele based test statistic is developed. Using this estimated variance yields a valid Type I error. However, when the frequencies of the allele under study and the disease rates differ among the sub-populations, it is difficult to correct for this bias. Explicit expressions for the excess false positive rate (EFPR) of the test due to bias and variance distortion are derived. It turns out that the bias created when both population stratification and disease rate heterogeneity are present usually has a greater effect on the EFPR than variance distortion. Comprehensive simulation studies strongly support these results. Friday, October 14, :00-11:00 am

Likelihood-Based Methods for Regression Analysis with Binary Exposure Status Assessed by Pooling Robert H.

in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples.

status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure.

5 Likelihood-Based Methods for Regression Analysis with Binary Exposure Status Assessed by Pooling Robert H. Lyles, PhD Associate Professor Department of Biostatistics and Bioinformatics Rollins School of Public Health, Emory University The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this talk, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene-disease association in a population-based case-control study of colorectal cancer. Friday, October 28, :00-11:00 am

6 Statistical Analysis of Copy Number Variations in Genome-Wide Association Studies Jianxin Shi, Ph.D. Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Copy number variations (CNVs) are one of the major genetic resources in human populations. CNVs were reported to be associated with the risks of many complex diseases in both case-control and family-based genome-wide association studies (GWAS) based on single nucleotide polymorphism (SNP) genotyping arrays. In this talk, I will first review the statistical and computational challenges in CNV discovery and CNV association testing in GWAS, which are two fundamentally different problems. Both successful and controversial stories will be discussed. I will then present our methods (implemented in SegCNV) for CNV discovery and CNV association testing in case-control GWAS. Our methods are computationally efficient and statistical powerful for detecting both overlapping and non-overlapping CNVs in GWAS. Finally, I will present a novel method (cnvfam) for simultaneously inferring CNVs in nuclear families of any sizes. We demonstrated that cnvfam substantially improved the sensitivity of detecting shared CNVs as well as the boundary inference, compared to methods ignoring family information. In addition, by simultaneously modeling all family members, cnvfam made a biologically more plausible inference of de novo CNVs v.s. inherited CNVs. Application of the methods to cancer and autism GWAS will be discussed. Friday, November 11, :00-11:00 am Refreshments will be provided at 9:45 am

High breakdown generalized S-estimators for unbalanced linear mixed effects models Speaker: Inna Cherovneva, PhD Associate Professor Division of Biostatistics Thomas Jefferson University Linear mixed

7 High breakdown generalized S-estimators for unbalanced linear mixed effects models Speaker: Inna Cherovneva, PhD Associate Professor Division of Biostatistics Thomas Jefferson University Linear mixed effects (LME) models provide the most important statistical tool for analysis of a wide range of clustered and correlated continuous measurements data in biomedical research. Real biomedical data are often contaminated with outliers, and it is well known that in the presence of outliers, traditional maximum likelihood based estimation methods may yield severely biased results misrepresenting characteristics of the majority of observations. High breakdown and re-descending M-estimators are currently the robust methods of choice for multivariate linear regression, but such estimators so far have been developed only for the limited class of completely balanced LME models. For a general unbalanced linear mixed effects model, we propose a class of re-descending M-estimators that reduce to the high breakdown S-estimators in a fully balanced linear mixed effects model. Therefore, they are referred to as generalized S-estimators. The asymptotic properties, influence function, and breakdown of the proposed estimators are investigated. A small simulation study is conducted to compare performance of the generalized S-estimates and M-estimates for unbalanced LME models. Finally, the proposed generalized S-estimators are used for robust analysis of age-related changes in hemoglobin levels of sickle cell disease patients. Friday, January 13, :00-11:00 am

trials. Some of the key assumptions upon which current methods are based are no longer valid.

8 Clinical Trials for Predictive Medicine New Challenges and Paradigms Speaker: Richard Simon, D. Sc. Biometric Research Branch, National Cancer Institute Development of treatments with companion diagnostics requires major changes in the standard paradigms for the design and analysis of clinical trials. Some of the key assumptions upon which current methods are based are no longer valid. The standard clinical trial paradigm of employing broad eligibility criteria and focusing design and analysis on testing the null hypothesis of no overall average effect relegating no longer has an adequate scientific basis in many oncology settings. It has lead to large clinical trials that identified small average treatment effects and resulted in approval of drugs that do not benefit most patients to whom they are administered. This problem has become exacerbated with the development of expensive molecularly targeted therapeutics, which is the vast majority of drugs being currently developed. The established molecular heterogeneity of human cancer and the availability of genomic technologies for characterizing the molecular basis of individual tumors requires the development of new paradigms for the design and analysis of randomized clinical trials as a reliable basis for predictive personalized oncology. I will review new designs for phase III clinical trials of new therapeutics and candidate companion diagnostics (predictive biomarkers). I will also outline a prediction based approach to the analysis of randomized clinical trials that both preserves the type I error and provides a reliable internally validated basis for predicting which patients are most likely or unlikely to benefit from the new treatment. This is a very structured approach whose use requires careful prospective planning. It may serve as a basis for a new generation of predictive clinical trials which provide the kinds of reliable individualized information which physicians and patients have long sought, but which have not been available from the past use of post-hoc subset analysis. Developing new treatments with predictive biomarkers for identifying the patients who are most likely or least likely to benefit makes drug development more complex. But for many new oncology drugs it is the only science based approach and should increase the chance of success. It may also lead to more consistency in results among trials and has obvious benefits for reducing the number of patients who ultimately receive expensive drugs which expose them risks of adverse events but no benefit. Friday, January 27, :00-11:00 am

9 Designing an Adaptive Multi-Arm Time-to-Event Clinical Trial Using Weibull Models Speaker: Alex Sverdlov, PhD Senior Research Biostatician Bristol-Myers Squibb We consider a design problem for a clinical trial with multiple treatment arms and time-toevent primary outcomes that are modeled using the Weibull family of distributions. We obtain the locally D-optimal design, along with compound optimal designs that deliver targeted efficiencies for different estimation problems and ethical considerations as well. The proposed designs are studied theoretically and are implemented using the doubly adaptive biased coin (DBCD) randomization procedure of Hu & Zhang (2004) for a clinical trial with censored Weibull outcomes. We compare the merits of our multiple-objective adaptive designs with popular and competing designs and show that our designs are more flexible, realistic, generally more ethical and frequently provide higher efficiencies for estimating different sets of parameters. Some Keywords: Clinical trial; D-optimal design; dose response study; doubly adaptive biased coin; ethical concern; randomization design. Friday, February 10, :00-11:00 am

10 Mountains out of molehills? The role of experiment design, quality assurance, and machine learning in effective clinical metabolomics biomarker discovery Speaker: David Broadhurst, Ph.D. University of Alberta Assistant Professor of Biostatistics Division of General Internal Medicine Department of Medicine Many clinical metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched case and control samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious. The main types of danger are not entirely independent of each other, but include: bias in patient selection and biobanking; poor choice in clinical endpoint; inadequate sample size and excessive false discovery rate due to multiple hypothesis testing; inappropriate choice of machine learning methods; poor model validation, and poor lab based quality assurance protocols. Many studies fail to take these issues into account, and thereby fail to discover anything of true significance (despite their claims). Here I summarise these problems, and provide pointers to assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. Friday, February 24, :00-11:00 am

11 Differential principal component analysis of ChIP-seq Speaker: Hongkai Ji, PhD Assistant Professor, Department of Biostatistics Bloomberg School of Public Health, Johns Hopkins University We propose Differential Principal Component Analysis (dpca) for analyzing multiple ChIP-seq datasets to identify differential protein-dna interactions between two biological conditions. dpca integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single statistical framework. It uses a small number of principal components to concisely summarize the major multi-protein differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dpca provides a new tool for efficiently analyzing large amounts of ChIP-seq data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential histone modifications at transcription factor binding sites and promoters. Friday, March 23, :00-11:00 am

12 Road to Bias Paved by Sensible Intuition: How We Sometimes Assess Exposures in Epidemiology Speaker: Igor Burstyn, Ph.D. Associate Professor and Director of Research Department of Environmental and Occupational Health School of Public Health, Drexel University The aim of the presentation is to raise awareness of unanticipated statistical properties of procedures employed in by epidemiologists in constructing exposure variables. These ad hoc procedures are typically based on sensible intuition and arguments by analogy, but in absence of theoretical exploration of their properties often lead to violations of assumptions in standard exposure-disease analyses. The net result is a biased body of literature that most likely distorts what can and should be learned from epidemiologic data. Two ad hoc procedures with such properties will be illustrated. The first is job-exposure matrix that aggregates exposure level and prevalence within a job into a single metric. I will demonstrate theoretically that this commonly adopted method leads to systematic and differential measurement error even under idealized conditions of no uncertainty about level and prevalence of exposure. Impact of this bias in a particular study of cancer risk will be illustrated via simulation. The second procedure is the habit of dichotomizing mismeasured continuous covariates for the sake of convenience and parsimony. This is known to create differential exposure misclassification and produces publication bias in situ (i.e. selective reporting of most significant findings). Newly proposed alternative procedure for presenting all dichotomizations will be presented (PMID: ) and the extent of bias from differential error illustrated. I conclude that there will always a need bold intuition to make advances in knowledge but urge all such insights must be examined critically and when they concern statistical properties with full engagement of statisticians who can explore the limits and pitfalls of methods that appear to be intuitively sensible. Friday, April 13, :00-11:00 am

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a