The Extended Liability-Threshold Model Klaus K. Holst <k.k.holst@biostat.ku.dk> Thomas Scheike, Jacob Hjelmborg 014-05-1 Heritability Twin studies Include both monozygotic (MZ) and dizygotic (DZ) twin pairs. DZ pairs on averages shares half of their genes MZ pairs are natural copies Difference in similarity of DZ and MZ twins may indicate genetic influence! Decomposition What is contribution of genetic and environmental factors to the variation in the outcome? The phenotype is the sum of genetic and environmental effects: Y = G + E Σ Y = Σ G + Σ E Polygenic model for continuous trait Polygenic model for continuous trait ACDE model Decompose outcome into Model Y i = A i + D i + C + E i, i = 1, A Additive genetic effects of alleles D Dominant genetic effects of alleles C Shared environmental effects E Unique environmental effects Assumptions No gene-environment interaction No gene-gene interaction (ACE) Same marginals of twin 1 and twin, and MZ and DZ. Equal environmental effects for MZ and DZ. Y i = A i + C i + D i + E i A i N (0, σ A), C i N (0, σ C ), D i N (0, σ D), E i N (0, σ E ) ( σ A Z A σa ) Z A σ A where Z A = σ A ( σ + C σc σc σ C Cov(Y 1, Y ) = ) ( σ + D Z D σd Z D σd σd { { 1, MZ 0.5 DZ and Z 1, MZ D = 0.5 DZ ) ( ) σ + E 0 0 σe
Polygenic model Heritability DZ 0.5/ MZ 1 DZ 0.5/ MZ 1 Heritability (Broad-sense) Heritability A 1 D 1 E 1 C A D E λ D λ D λe λ λ A A λ C λ C λ E h Y = Var(G) Var(Y ) = Shared environmental effect σ A + σ D σ A + σ C + σ D + σ E Y 1 Y c Y = σ C σ A + σ C + σ D + σ E X Z In the ACE model the heritability is given by h = (ρ MZ ρ DZ ) Liability model/threshold model for binary data For the dichotomous prostate cancer status outcome (cancer or death without cancer) we can use a Liability Model Let Y 1 and Y be cancer status of the two twins. Model based on Probit link: P(Y 1 = 1, Y = 1 X ) = Φ(β T X 1 + A 1 + C 1, β T X + A + C ) Liability model/threshold model for binary data 0.4 0.3 P(Y 1 = 1) = P(Y 1 > τ 1 ) where Φ is bivariate standard normal CDF, e.g. { 1, Yi > 0 Y i = 0, Yi 0 Y i = β T X i + A i + C i + E i, Var(E i ) = 1. 0. 0.1 0.0 Y=0 Y=1 Y * τ
Liability model/threshold model for binary data Liability model in R P(Y 1 = 1, Y = 1) = P(Y 1 > τ 1, Y > τ ) The bivariate probit model, biprobit, and Liability model, bptwin, is available in R via the mets package > library(mets) > data(twinstut) > twinstut <- subset(twinstut,zyg!="os") Y Y 1 = 0, Y = 1 Y 1 = 1, Y = 1 τ Y 1 = 0, Y = 0 Y 1 = 1, Y = 0 > biprobit(stutter ~ sex+strata(zyg), data=twinstut, id="tvparnr") > bptwin(stutter ~ sex+strata(zyg), data=twinstut, id="tvparnr", zyg="zyg", DZ="dz") Statistical inference via compare (LRT, Wald test), confint, AIC,... τ 1 Y1 Heritability of prostate cancer Liability Model for censored data Considerable interest in quantifying the genetic influence of prostate cancer Lichtenstein et al (000). Environmental and heritable factors in the causation of cancer. NEJM 343():78 85. Reported case-wise concordance rates (MZ; DZ) of 0.0; 0.09, and a heritability of 0.4 (0.9; 0.50).
Prostate cancer - Danish cohorts Simulation, Scheike et al 013, Lifetime Data An. Prostate cancer - Danish cohorts Number of pairs at time of follow-up MZ & DZ Status Prostate cancer No cancer, dead No cancer, alive Prostate cancer 6 & 15 0 101 No cancer and dead 100 14 & 314 139 No cancer and alive 38 54 404 & 6884 Table : Number of pairs by status at time of follow-up with MZ pairs in lower left triangle (colored red) and DZ pairs in upper right triangle. Massive amount of censoring which cannot be ignored!!! Lifetime prevalence Lifetime concordance MZ DZ 0% 8% 55% 8% 0% 8% 55% 8% 0% 0.10 0.101 0.08 0.064 0.034 0.07 0.01 0.018 5% 0.103 0.083 0.063 0.044 0.03 0.04 0.016 0.013 55% 0.086 0.065 0.044 0.05 0.033 0.03 0.013 0.007 8% 0.078 0.054 0.09 0.006 0.069 0.045 0.01 0.00 Table 1: Lifetime disease prevalence and lifetime concordance for MZ twins estimated using Liability threshold model ignoring censoring for different censorings patterns of MZ and DZ twins. Cross odds-ratio dependence parameter 3 and for MZ and DZ, respectively. Simulation, Scheike et al 013, Lifetime Data An. Censoring. Alternatives to Liability model A MZ DZ 0% 7% 55% 8% 0% 8% 55% 8% 0% 0.374 0.1 0.000 0.000 0.000 0.170 0.431 0.58 1% 0.473 0.461 0.046 0.000 0.000 0.06 0.49 0.581 55% 0.67 0.64 0.431 0.000 0.000 0.000 0.167 0.640 8% 0.988 0.979 0.949 0.57 0.000 0.000 0.000 0.494 Table : Estimates of variance for genetic (A) and environmental components (C) of threshold liabilty model given different censoring patterns for MZ and DZ twins. Cross odds-ratio dependence parameter 3 and 1.5 for MZ and DZ, respectively. C Modeling time to first prostate cancer Hazard scale model? Concordance scale? AFT/Tobit? We observe the minimum of the failure time Ti time C i T i = min(t i, C i ) g(t i ) = β T X i + A i + C i + E i and the censoring
Multi-state model Tobit framework complicated by competing risks Estimating equations for the Liability model Nice properties of the Probit model gives us closed forms of the marginals of the twin-pair observation Y = (Y 1, Y ) T (up to evaluation of bivariate CDF). Alive α 13 (t) Dead α 3 (t) U(θ) = U(θ; Y ) ( ) 1 1 vec T Σθ [ = Φ µθ,σ θ (0) θ T vec(σ 1 θ )Φ µ θ,σ θ (0)+ ] (Σ 1 θ Σ 1 θ ) vec{v µ θ,σ θ (0)} + α 1 (t) Prostate cancer ( vec µθ θ T ) T Σ 1 θ vec{m µθ,σ θ (0)} L = diag{y 1 1, Y 1} Σ θ = Σ θ (Y ) = LΣ 0 θ LT, µ θ = µ θ (Y ) = Lµ 0 θ Estimating equations for the Liability model Define indicator of an observation being an actual event time ij = I {T ij C i }, Estimating equations for the Liability model Weights based on parametric survival model, stratified Kaplan-Meier, Cox Proportional Hazards Model, Aalen s Additive Model,... for twin j = 1, of the ith twin-pair, and with same random censoring C i (independent of Tij given possible covariates) for the twin-pair. Model for censoring mechanism G c (t; Z i ) = P(C i > t Z i ) Define IPCW Estimating Equation based on complete pairs only U w (θ) = n c i=1 OBS: positivity assumption! i1 i min{g c (T i1 ; Z i ), G c (T i ; Z i )} U i(θ) With correct model for the weights we obtain consistency E (U w (θ)) = E {E (U w (θ) Z, X )} = E (U i (θ)) = 0
Estimating equations for the Liability model Under mild regularity conditions the estimates, θ, from solving U w are consistent and asymptotically normal following from i.i.d. decomposition n{ θ θ} = 1 n n ɛ i + o p (1) i=1 and asymptotic variance (sandwich type) 1 n n i=1 ɛ Estimates of concordance, tetrachoric correlation, heritability, etc. can be obtained applying the Delta theorem (using variance stabilizing transformations tanh, logit,...) Implementation, type="ace" A 0.56470 0.5070 0.6475 C 0.00000 0.00000 0.00000 E 0.43530 0.3755 0.49730 Correlation MZ 0.56470 0.5007 0.691 Correlation DZ 0.835 0.5141 0.317 MZ: Concordance 0.01774 0.01446 0.0174 Conditional 0.3009 0.601 0.34766 Marginal 0.05873 0.057 0.06538 DZ: Concordance 0.00878 0.007 0.01068 Conditional 0.14954 0.13389 0.16666 Marginal 0.05873 0.057 0.06538 Heritability 0.56470 0.5070 0.6475 Implementation > library(mets) > dw <- ipw(surv(time,status==0)~strata(country), data=prostatedata) > a <- bptwin(cancer ~ country, data=dw, id="tvparnr", zyg="zyg", DZ="DZ", weight="w", type="ace") > summary(a) Estimate Std.Err Z p-value (Intercept) -.37835 0.094033-5.33941 0.0000 landfinland 0.605013 0.067857 8.91595 0.0000 landnorway 0.664485 0.070300 9.451 0.0000 landsweden 0.55765 0.05939 10.56493 0.0000 log(var(a)) 0.6061 0.1788.04466 0.0409 log(var(c)) -6.669063 6.054746-4.404655 0.0000 Total MZ/DZ Complete pairs MZ/DZ 8585/17115 3161/5894......... Comparison with competing risks model (ref:se) Probability 0.00 0.05 0.10 0.15 0.0 Cumulative Incidence Function (MZ) 60 70 80 90 100 Time IPW Naive bicomprisk
Comparison with bivariate competing risks model Implementation. type="cor" Probability 0.00 0.01 0.0 0.03 0.04 0.05 0.06 Concordance Prostate cancer (MZ) bptwin bicomprisk Correlation MZ 0.56718 0.49546 0.6311 Correlation DZ 0.7713 0.044 0.34680 MZ: Concordance 0.01784 0.01441 0.007 Conditional 0.30378 0.5774 0.35413 Marginal 0.0587 0.0571 0.06537 DZ: Concordance 0.00865 0.00661 0.0113 Conditional 0.14736 0.11868 0.18154 Marginal 0.0587 0.0571 0.06537 Heritability 0.58009 0.3808 0.75669 55 60 65 70 75 80 85 90 Time Implementation. type="flex" R Implementation Correlation MZ 0.57110 0.4999 0.6346 Correlation DZ 0.7485 0.053 0.34419 MZ: Concordance 0.01854 0.01369 0.0507 Conditional 0.30886 0.5980 0.3666 Marginal 0.06003 0.04953 0.0758 DZ: Concordance 0.00845 0.0069 0.01136 Conditional 0.14546 0.11645 0.180 Marginal 0.05811 0.05099 0.06615 Stratified analysis > a <- bptwin(cancer ~ strata(country), data=dw, id="tvparnr", zyg="zyg", DZ="DZ", weight="w", type="cor") Optimization > mean(score(a)^) ## Close to zero? > a$opt ## Messages from optimization routime > a <- bptwin(..., control=list(trace=1,iter.max=100, start=mystart,grtol=1e-10,...)) Heritability 0.5950 0.3970 0.76577
Hypothesis testing Cumulative Heritability Estimator is not MLE and usual likelihood ratio test are no longer an option. Instead general linear hypotheses of the form H 0 : Bθ = θ 0 (1) Introducing time again... Pr(Y 1 = 1, Y = 1, T 1 τ,t τ) = Φ(A τ 1 + C τ 1, A τ + C τ ) can be tested using a Wald test (B θ θ 0 ) T (BΣ θb T ) 1 (B θ θ 0 ) χ rank(b) () In R we can specify the contrast matrix B (see matrix,cbind,diag) and use the compare function > compare(a,contrast=b,null=b0) > compare(a,par) ## Index of parameters to test b[i]=0 > twinlm.time(cancer~1,zyg="zyg",dz="dz",id="id", type="ace",data=prt, cens.formula=surv(time,status==0)~zyg, breaks=c(70,100,by=4)) Prostate Cancer Occurence, Danish Twin Cohorte Prostate Cancer Occurence, Danish Twin Cohorte Probability 0.00 0.0 0.04 0.06 0.08 F 1 (t) C MZ (t) C DZ (t) F 1 (t) Relative recurrence risk 0 5 10 15 0 5 30 MZ DZ 70 75 80 85 90 95 100 70 75 80 85 90 95 100 Age Age
Prostate Cancer Occurence, Danish Twin Cohorte Heritability under the presence of censoring Heritability 0.0 0. 0.4 0.6 0.8 1.0 ADE ACE 70 75 80 85 90 95 100 Censoring in cancer registers cannot be ignored The IPCW Liability model corresponds to the bivariate competing risks model evaluated at infinity Extends the classical polygenic model for dichotomous endpoints (Liability model) to model with missing data due to right censoring Estimation straightforward in R Time effects can be pragmatically modeled using the cumulative heritability Limitations: Interpretation on the liablity scale; lack of identification of full model; delayed entry Estimating equations could be based on the non-parametric same cens. concordance estimator Age