Supplementary Figure 1 Replicability of blood eqtl effects in ileal biopsies from the RISK study. eqtls detected in the vicinity of SNPs associated with IBD tend to show concordant effect size and direction in blood and ileum. The effects of 136 eqtls available in ileum are shown (Supplementary Table 2). The x axis shows the values for the eqtls detected in peripheral blood from the Blood eqtl browser; the y axis shows the values in the eqtl mapping study with ileal biopsies from the RISK study (see Mapping study in the RISK cohort to build the ileal TRS in the Online Methods). The dashed best-fitting least-squares regression line corresponds to Spearman r = 0.54 (P = 2 10 11 ). Values in the corners indicate the percentage of loci in each quadrant, showing that 70% are concordant in direction of effect in the two tissues (P = 1.7 10 6, sign test).
Supplementary Figure 2 Performance of the GRS and TRS based on the initial set of 157 candidate genes. (a,b) Each plot shows the TRS based on 157 IBD genes associated with 96 eqtls that are also associated with IBD or in LD with a SNP associated with the disease (Supplementary Table 1). The discriminatory performance of the GRS versus TRS based on these genes is shown for disease status: comparison of samples with Crohn s disease (n = 210) and controls (n = 35) (a) and disease course (3-year period after diagnosis): comparison of samples that remain in non-complicated Crohn s disease (B1; n = 183) and those that develop complicated disease (B2 and/or B3; n = 27) (b). The standardized GRS and TRS are shown on the y axis. Differences between groups (in s.d. units) along with P values (two-sided t test) are reported for each comparison.
Supplementary Figure 3 Selection of genes based on SMR and coloc results. (a) Each point represents the log 10 (P value) (NLP) for the blood eqtl association and Crohn s disease GWAS association for 157 candidate genes. Colors represent the significance of the SMR statistic, clearly showing that the most highly significant genes are strongly associated with both traits. Similar plots are observed for ulcerative colitis and IBD. All 39 genes with SMR P < 2.3 10 4 (red and brown dots) for all three disease classifications were included in the final SMR-based TRS. (b) The coloc H 4 score estimates the posterior probability that the same causal variant drives both the GWAS and eqtl associations. This plot shows that poor SMR values (small NLPs) tend also to have low coloc H 4 scores; however, only approximately half of the strong SMR values (large NLPs) have strong coloc H 4 posterior probabilities. The 29 genes with coloc H 4 greater than 0.8 for the three disease phenotypes were included in the final coloc-based TRS. This includes 14 genes not in the SMR set.
Supplementary Figure 4 Relationship between transcriptional risk scores and location of inflammation. Because the Paris classification of pediatric Crohn s disease includes location of disease, which was strongly correlated with the degree of inflammation in the ileum from which biopsies were obtained, we plot here the relationship between disease location and the 29-gene coloc-derived TRS. RISK study patients were classified into two categories according to the presence/absence of visible ileal inflammation in endoscopies performed at diagnosis (L1 (ileum-only) and L3 (ileocolonic) cases were classified as inflamed ; L2 (colonic-only) cases were classified as non-inflamed ). Only two of the cases that progressed to complicated disease were noninflamed, which are not shown owing to low sample size. The TRS is slightly elevated in inflamed versus endoscopically non-inflamed B1 cases (P < 0.02) and is also elevated in B1 cases with non-inflamed ilea as compared to non-ibd controls (P < 1 10 6 ), confirming that the TRS picks up a signal that is related but complementary to inflammation. Complicated cases have an elevated TRS even relative to inflamed B1 cases (P < 7 10 4 ). A box plot of values is shown for each group along with P values for pairwise comparisons (two-sided t test).
Supplementary Figure 5 Performance of the GRS and TRS based on 39 susceptibility genes detected by SMR. (a,b) Thirty-nine genes were detected by SMR as being under the control of 29 causal variants that account for the association detected by GWAS and the eqtl effect reported in the Blood eqtl browser (Supplementary Table 4). The performance of the GRS verus TRS based on these genes is shown for disease status: comparison of samples with Crohn s disease (n = 210) versus non-ibd controls (n = 35) (a) and disease course (3-year period after diagnosis): comparison of samples that remain in non-complicated Crohn s disease (B1; n = 183) versus those that develop complicated disease (B2 and/or B3; n = 27) (b). The standardized GRS and TRS are shown on the y axis. Differences between groups (in s.d. units) along with P values (two-sided t test) are reported for each comparison.
Supplementary Figure 6 Performance of PRSs based on LD-pruned variants at different significance inclusion thresholds. (a,b) PRSs at different thresholds (Online Methods) successfully separate Crohn s disease cases from non-ibd controls (a) but fail to distinguish according to development of complicated disease (b). The performance of PRSs using SNPs that pass a range of liberal P- value thresholds in GWAS analysis is shown (the inclusion threshold and total number of variants used are reported on the y axis). Differences between groups (in s.d. units) along with P values for each comparison are reported on the x axis.