Heritability. The Extended Liability-Threshold Model. Polygenic model for continuous trait. Polygenic model for continuous trait.

Similar documents
Analysis of bivariate binomial data: Twin analysis

Estimating heritability for cause specific mortality based on twin studies

Analysis of Twin Data: Time to Event Models

Resemblance between Relatives (Part 2) Resemblance Between Relatives (Part 2)

Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data

On Biostatistical Genetics

Liability Threshold Models

Analysis of Vaccine Effects on Post-Infection Endpoints Biostat 578A Lecture 3

The Statistical Analysis of Failure Time Data

Imaging Genetics: Heritability, Linkage & Association

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20

Discontinuous Traits. Chapter 22. Quantitative Traits. Types of Quantitative Traits. Few, distinct phenotypes. Also called discrete characters

SEMs for Genetic Analysis

A novel approach to estimation of the time to biomarker threshold: Applications to HIV

Bivariate Genetic Analysis

Estimating genetic variation within families

Statistical Science Issues in HIV Vaccine Trials: Part I

A Comparison of Methods for Determining HIV Viral Set Point

Biostatistics II

Using dynamic prediction to inform the optimal intervention time for an abdominal aortic aneurysm screening programme

The Inheritance of Complex Traits

Research Article Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins

Interaction of Genes and the Environment

It is shown that maximum likelihood estimation of

Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Modelling prognostic capabilities of tumor size: application to colorectal cancer

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

For more information about how to cite these materials visit

We expand our previous deterministic power

GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL DATA. Anti-Epileptic Drug Trial Timeline. Exploratory Data Analysis. Exploratory Data Analysis

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability

Causal mediation analysis on survival data: an application on the National March Cohort

MEA DISCUSSION PAPERS

Score Tests of Normality in Bivariate Probit Models

Decomposition of the Genotypic Value

SCHOOL OF MATHEMATICS AND STATISTICS

Small-area estimation of mental illness prevalence for schools

Non-parametric methods for linkage analysis

With the rapid advances in molecular biology, the near

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

A Brief Introduction to Bayesian Statistics

Part [1.0] Introduction to Development and Evaluation of Dynamic Predictions

Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ

Package xmeta. August 16, 2017

Interaction of Genes and the Environment

Multivariate Mixed-Effects Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Accuracy

An Introduction to Bayesian Statistics

Bayesian Dose Escalation Study Design with Consideration of Late Onset Toxicity. Li Liu, Glen Laird, Lei Gao Biostatistics Sanofi

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Genetic influences on angina pectoris and its impact on coronary heart disease

Comparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman

Robust Nonparametric Inference for Stochastic Interventions Under Multi-Stage Sampling. Nima Hejazi

Statistical Tolerance Regions: Theory, Applications and Computation

On Regression Analysis Using Bivariate Extreme Ranked Set Sampling

Statistical Models for Enhancing Cross-Population Comparability

Separation Anxiety Disorder and Adult Onset Panic Attacks Share a Common Genetic Diathesis

Dynamic prediction using joint models for recurrent and terminal events: Evolution after a breast cancer

The Genetic Epidemiology of Cancer: Interpreting Family and Twin Studies and Their Implications for Molecular Genetic Approaches

Developing and evaluating polygenic risk prediction models for stratified disease prevention

Bias in Correlations from Selected Samples of Relatives: The Effects of Soft Selection

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

A bivariate parametric long-range dependent time series model with general phase

Bayesian hierarchical modelling

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Sample size calculation for a stepped wedge trial

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

TWISTED SURVIVAL: IDENTIFYING SURROGATE ENDPOINTS FOR MORTALITY USING QTWIST AND CONDITIONAL DISEASE FREE SURVIVAL. Beth A.

INTRODUCTION TO GENETIC EPIDEMIOLOGY (1012GENEP1) Prof. Dr. Dr. K. Van Steen

Maria-Athina Altzerinakou1, Xavier Paoletti2. 9 May, 2017

This article is concerned with the power to detect

Estimating and comparing cancer progression risks under varying surveillance protocols: moving beyond the Tower of Babel

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Keywords: Clustered processes; Jump probabilities; Mover stayer model; Multistate model; Psoriatic arthritis; Sojourn times

Heritability. The concept

Today s Topics. Cracking the Genetic Code. The Process of Genetic Transmission. The Process of Genetic Transmission. Genes

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

Analysis of left-censored multiplex immunoassay data: A unified approach

Statistical Models for Censored Point Processes with Cure Rates

Advanced IPD meta-analysis methods for observational studies

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Regression Discontinuity Designs: An Approach to Causal Inference Using Observational Data

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

The SAS SUBTYPE Macro

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

Improving ecological inference using individual-level data

Analysis of Hearing Loss Data using Correlated Data Analysis Techniques

Overview. All-cause mortality for males with colon cancer and Finnish population. Relative survival

Individual Prediction and Validation Using Joint Longitudinal-Survival Models in Prostate Cancer Studies. Jeremy M G Taylor University of Michigan

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

Bivariate Density Estimation With an Application to Survival Analysis

Psych 3102 Introduction to Behavior Genetics

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

Genetics of common disorders with complex inheritance Bettina Blaumeiser MD PhD

Cognitive, affective, & social neuroscience

Application of EM Algorithm to Mixture Cure Model for Grouped Relative Survival Data

Transcription:

The Extended Liability-Threshold Model Klaus K. Holst <k.k.holst@biostat.ku.dk> Thomas Scheike, Jacob Hjelmborg 014-05-1 Heritability Twin studies Include both monozygotic (MZ) and dizygotic (DZ) twin pairs. DZ pairs on averages shares half of their genes MZ pairs are natural copies Difference in similarity of DZ and MZ twins may indicate genetic influence! Decomposition What is contribution of genetic and environmental factors to the variation in the outcome? The phenotype is the sum of genetic and environmental effects: Y = G + E Σ Y = Σ G + Σ E Polygenic model for continuous trait Polygenic model for continuous trait ACDE model Decompose outcome into Model Y i = A i + D i + C + E i, i = 1, A Additive genetic effects of alleles D Dominant genetic effects of alleles C Shared environmental effects E Unique environmental effects Assumptions No gene-environment interaction No gene-gene interaction (ACE) Same marginals of twin 1 and twin, and MZ and DZ. Equal environmental effects for MZ and DZ. Y i = A i + C i + D i + E i A i N (0, σ A), C i N (0, σ C ), D i N (0, σ D), E i N (0, σ E ) ( σ A Z A σa ) Z A σ A where Z A = σ A ( σ + C σc σc σ C Cov(Y 1, Y ) = ) ( σ + D Z D σd Z D σd σd { { 1, MZ 0.5 DZ and Z 1, MZ D = 0.5 DZ ) ( ) σ + E 0 0 σe

Polygenic model Heritability DZ 0.5/ MZ 1 DZ 0.5/ MZ 1 Heritability (Broad-sense) Heritability A 1 D 1 E 1 C A D E λ D λ D λe λ λ A A λ C λ C λ E h Y = Var(G) Var(Y ) = Shared environmental effect σ A + σ D σ A + σ C + σ D + σ E Y 1 Y c Y = σ C σ A + σ C + σ D + σ E X Z In the ACE model the heritability is given by h = (ρ MZ ρ DZ ) Liability model/threshold model for binary data For the dichotomous prostate cancer status outcome (cancer or death without cancer) we can use a Liability Model Let Y 1 and Y be cancer status of the two twins. Model based on Probit link: P(Y 1 = 1, Y = 1 X ) = Φ(β T X 1 + A 1 + C 1, β T X + A + C ) Liability model/threshold model for binary data 0.4 0.3 P(Y 1 = 1) = P(Y 1 > τ 1 ) where Φ is bivariate standard normal CDF, e.g. { 1, Yi > 0 Y i = 0, Yi 0 Y i = β T X i + A i + C i + E i, Var(E i ) = 1. 0. 0.1 0.0 Y=0 Y=1 Y * τ

Liability model/threshold model for binary data Liability model in R P(Y 1 = 1, Y = 1) = P(Y 1 > τ 1, Y > τ ) The bivariate probit model, biprobit, and Liability model, bptwin, is available in R via the mets package > library(mets) > data(twinstut) > twinstut <- subset(twinstut,zyg!="os") Y Y 1 = 0, Y = 1 Y 1 = 1, Y = 1 τ Y 1 = 0, Y = 0 Y 1 = 1, Y = 0 > biprobit(stutter ~ sex+strata(zyg), data=twinstut, id="tvparnr") > bptwin(stutter ~ sex+strata(zyg), data=twinstut, id="tvparnr", zyg="zyg", DZ="dz") Statistical inference via compare (LRT, Wald test), confint, AIC,... τ 1 Y1 Heritability of prostate cancer Liability Model for censored data Considerable interest in quantifying the genetic influence of prostate cancer Lichtenstein et al (000). Environmental and heritable factors in the causation of cancer. NEJM 343():78 85. Reported case-wise concordance rates (MZ; DZ) of 0.0; 0.09, and a heritability of 0.4 (0.9; 0.50).

Prostate cancer - Danish cohorts Simulation, Scheike et al 013, Lifetime Data An. Prostate cancer - Danish cohorts Number of pairs at time of follow-up MZ & DZ Status Prostate cancer No cancer, dead No cancer, alive Prostate cancer 6 & 15 0 101 No cancer and dead 100 14 & 314 139 No cancer and alive 38 54 404 & 6884 Table : Number of pairs by status at time of follow-up with MZ pairs in lower left triangle (colored red) and DZ pairs in upper right triangle. Massive amount of censoring which cannot be ignored!!! Lifetime prevalence Lifetime concordance MZ DZ 0% 8% 55% 8% 0% 8% 55% 8% 0% 0.10 0.101 0.08 0.064 0.034 0.07 0.01 0.018 5% 0.103 0.083 0.063 0.044 0.03 0.04 0.016 0.013 55% 0.086 0.065 0.044 0.05 0.033 0.03 0.013 0.007 8% 0.078 0.054 0.09 0.006 0.069 0.045 0.01 0.00 Table 1: Lifetime disease prevalence and lifetime concordance for MZ twins estimated using Liability threshold model ignoring censoring for different censorings patterns of MZ and DZ twins. Cross odds-ratio dependence parameter 3 and for MZ and DZ, respectively. Simulation, Scheike et al 013, Lifetime Data An. Censoring. Alternatives to Liability model A MZ DZ 0% 7% 55% 8% 0% 8% 55% 8% 0% 0.374 0.1 0.000 0.000 0.000 0.170 0.431 0.58 1% 0.473 0.461 0.046 0.000 0.000 0.06 0.49 0.581 55% 0.67 0.64 0.431 0.000 0.000 0.000 0.167 0.640 8% 0.988 0.979 0.949 0.57 0.000 0.000 0.000 0.494 Table : Estimates of variance for genetic (A) and environmental components (C) of threshold liabilty model given different censoring patterns for MZ and DZ twins. Cross odds-ratio dependence parameter 3 and 1.5 for MZ and DZ, respectively. C Modeling time to first prostate cancer Hazard scale model? Concordance scale? AFT/Tobit? We observe the minimum of the failure time Ti time C i T i = min(t i, C i ) g(t i ) = β T X i + A i + C i + E i and the censoring

Multi-state model Tobit framework complicated by competing risks Estimating equations for the Liability model Nice properties of the Probit model gives us closed forms of the marginals of the twin-pair observation Y = (Y 1, Y ) T (up to evaluation of bivariate CDF). Alive α 13 (t) Dead α 3 (t) U(θ) = U(θ; Y ) ( ) 1 1 vec T Σθ [ = Φ µθ,σ θ (0) θ T vec(σ 1 θ )Φ µ θ,σ θ (0)+ ] (Σ 1 θ Σ 1 θ ) vec{v µ θ,σ θ (0)} + α 1 (t) Prostate cancer ( vec µθ θ T ) T Σ 1 θ vec{m µθ,σ θ (0)} L = diag{y 1 1, Y 1} Σ θ = Σ θ (Y ) = LΣ 0 θ LT, µ θ = µ θ (Y ) = Lµ 0 θ Estimating equations for the Liability model Define indicator of an observation being an actual event time ij = I {T ij C i }, Estimating equations for the Liability model Weights based on parametric survival model, stratified Kaplan-Meier, Cox Proportional Hazards Model, Aalen s Additive Model,... for twin j = 1, of the ith twin-pair, and with same random censoring C i (independent of Tij given possible covariates) for the twin-pair. Model for censoring mechanism G c (t; Z i ) = P(C i > t Z i ) Define IPCW Estimating Equation based on complete pairs only U w (θ) = n c i=1 OBS: positivity assumption! i1 i min{g c (T i1 ; Z i ), G c (T i ; Z i )} U i(θ) With correct model for the weights we obtain consistency E (U w (θ)) = E {E (U w (θ) Z, X )} = E (U i (θ)) = 0

Estimating equations for the Liability model Under mild regularity conditions the estimates, θ, from solving U w are consistent and asymptotically normal following from i.i.d. decomposition n{ θ θ} = 1 n n ɛ i + o p (1) i=1 and asymptotic variance (sandwich type) 1 n n i=1 ɛ Estimates of concordance, tetrachoric correlation, heritability, etc. can be obtained applying the Delta theorem (using variance stabilizing transformations tanh, logit,...) Implementation, type="ace" A 0.56470 0.5070 0.6475 C 0.00000 0.00000 0.00000 E 0.43530 0.3755 0.49730 Correlation MZ 0.56470 0.5007 0.691 Correlation DZ 0.835 0.5141 0.317 MZ: Concordance 0.01774 0.01446 0.0174 Conditional 0.3009 0.601 0.34766 Marginal 0.05873 0.057 0.06538 DZ: Concordance 0.00878 0.007 0.01068 Conditional 0.14954 0.13389 0.16666 Marginal 0.05873 0.057 0.06538 Heritability 0.56470 0.5070 0.6475 Implementation > library(mets) > dw <- ipw(surv(time,status==0)~strata(country), data=prostatedata) > a <- bptwin(cancer ~ country, data=dw, id="tvparnr", zyg="zyg", DZ="DZ", weight="w", type="ace") > summary(a) Estimate Std.Err Z p-value (Intercept) -.37835 0.094033-5.33941 0.0000 landfinland 0.605013 0.067857 8.91595 0.0000 landnorway 0.664485 0.070300 9.451 0.0000 landsweden 0.55765 0.05939 10.56493 0.0000 log(var(a)) 0.6061 0.1788.04466 0.0409 log(var(c)) -6.669063 6.054746-4.404655 0.0000 Total MZ/DZ Complete pairs MZ/DZ 8585/17115 3161/5894......... Comparison with competing risks model (ref:se) Probability 0.00 0.05 0.10 0.15 0.0 Cumulative Incidence Function (MZ) 60 70 80 90 100 Time IPW Naive bicomprisk

Comparison with bivariate competing risks model Implementation. type="cor" Probability 0.00 0.01 0.0 0.03 0.04 0.05 0.06 Concordance Prostate cancer (MZ) bptwin bicomprisk Correlation MZ 0.56718 0.49546 0.6311 Correlation DZ 0.7713 0.044 0.34680 MZ: Concordance 0.01784 0.01441 0.007 Conditional 0.30378 0.5774 0.35413 Marginal 0.0587 0.0571 0.06537 DZ: Concordance 0.00865 0.00661 0.0113 Conditional 0.14736 0.11868 0.18154 Marginal 0.0587 0.0571 0.06537 Heritability 0.58009 0.3808 0.75669 55 60 65 70 75 80 85 90 Time Implementation. type="flex" R Implementation Correlation MZ 0.57110 0.4999 0.6346 Correlation DZ 0.7485 0.053 0.34419 MZ: Concordance 0.01854 0.01369 0.0507 Conditional 0.30886 0.5980 0.3666 Marginal 0.06003 0.04953 0.0758 DZ: Concordance 0.00845 0.0069 0.01136 Conditional 0.14546 0.11645 0.180 Marginal 0.05811 0.05099 0.06615 Stratified analysis > a <- bptwin(cancer ~ strata(country), data=dw, id="tvparnr", zyg="zyg", DZ="DZ", weight="w", type="cor") Optimization > mean(score(a)^) ## Close to zero? > a$opt ## Messages from optimization routime > a <- bptwin(..., control=list(trace=1,iter.max=100, start=mystart,grtol=1e-10,...)) Heritability 0.5950 0.3970 0.76577

Hypothesis testing Cumulative Heritability Estimator is not MLE and usual likelihood ratio test are no longer an option. Instead general linear hypotheses of the form H 0 : Bθ = θ 0 (1) Introducing time again... Pr(Y 1 = 1, Y = 1, T 1 τ,t τ) = Φ(A τ 1 + C τ 1, A τ + C τ ) can be tested using a Wald test (B θ θ 0 ) T (BΣ θb T ) 1 (B θ θ 0 ) χ rank(b) () In R we can specify the contrast matrix B (see matrix,cbind,diag) and use the compare function > compare(a,contrast=b,null=b0) > compare(a,par) ## Index of parameters to test b[i]=0 > twinlm.time(cancer~1,zyg="zyg",dz="dz",id="id", type="ace",data=prt, cens.formula=surv(time,status==0)~zyg, breaks=c(70,100,by=4)) Prostate Cancer Occurence, Danish Twin Cohorte Prostate Cancer Occurence, Danish Twin Cohorte Probability 0.00 0.0 0.04 0.06 0.08 F 1 (t) C MZ (t) C DZ (t) F 1 (t) Relative recurrence risk 0 5 10 15 0 5 30 MZ DZ 70 75 80 85 90 95 100 70 75 80 85 90 95 100 Age Age

Prostate Cancer Occurence, Danish Twin Cohorte Heritability under the presence of censoring Heritability 0.0 0. 0.4 0.6 0.8 1.0 ADE ACE 70 75 80 85 90 95 100 Censoring in cancer registers cannot be ignored The IPCW Liability model corresponds to the bivariate competing risks model evaluated at infinity Extends the classical polygenic model for dichotomous endpoints (Liability model) to model with missing data due to right censoring Estimation straightforward in R Time effects can be pragmatically modeled using the cumulative heritability Limitations: Interpretation on the liablity scale; lack of identification of full model; delayed entry Estimating equations could be based on the non-parametric same cens. concordance estimator Age