RNA sequencing of cancer reveals novel splicing alterations Jeyanthy Eswaran, Anelia Horvath, Sucheta Godbole, Sirigiri Divijendra Reddy, Prakriti Mudvari, Kazufumi Ohshiro, Dinesh Cyanam, Sujit Nair, Suzanne A. W. Fuqua, Kornelia Polyak, Liliana D. Florea & Rakesh Kumar
Supplemental Table 1: NBS global sequencing statistics and read distribution Normal breast sample global mrna sequencing statistics Global Statistics NBS1 NBS2 NBS3 Total Number of reads 63636829 61418652 69648831 Unique Reads 63199920 60961838 69301083 Aligned Reads 58889160 56654064 64624628 Unique Exons 156808 156716 156808 Total Exons 248904 241317 248904 Transcripts (known and novel) 21718 21724 21939 Genes 15562 15605 15498
Supplemental Table 2: The number of cancer specific isoforms that align (nblast) with the human open reading frame database, human ORFeome 8.1 (http://horfdb.dfci.harvard.edu/). Cancer group Total cancer specific isoforms that express only in cancer subtype The number of cancer specific isoforms that align with ORF above 90% identity 540 246 Non- 355 165 HER-2 Positive 588 254
Supplemental Table 3: The number of genes involved in differential promoter usage and promoter switching events in breast cancers in comparison to NBS Differential promoter usage and promoter switching events (PSE) Differential promoter usage comparison Statistically significant PSE genes Number of primary transcripts vs NBS 138 1697 75 Non- vs NBS 83 832 44 Genes that change coding region due to PSE HER-2 vs NBS 178 2690 152
Supplement Table 4: Determination of alternative splice events in individual breast cancer samples in comparison to NBS 1 (A) 2 (B) and 3 (C) using Multivariate Analysis of Transcript Splicing (MATS). A. B.
C.
Supplement Table 5: Determination of alternative splice events found common in individual breast cancer type vs. normal breast sample 1, 2 and 3 using MATS. Comparisons used to identify cancer specific events Exon Skip Alternative 5' End Alternativ e 3' End Retained Intron vs. NBS 2 0 0 24 Non- vs. NBS 7 0 3 37 HER2-positive vs. NBS 5 1 4 20
Supplement Table 6A: Determination of alternative splice events in merged breast cancer samples in comparison to merged NBS using MATS Merged Groups taken for MATS Alternative 3 prime Alternative 5 prime Mutually exclusive exon Intron retention Exon skip vs. NBS 405 446 1549 2038 2898 Non- vs. NBS 394 443 1573 2124 2811 HER2-positive vs. NBS 358 398 1387 2148 2027 Supplement Table 6B: Determination of switch like event i.e. specific alternative splice events that occur only in merged breast cancer samples or in merged NBS using MATS Event Type Event Number of Events Group A vs NBS Group B vs NBS Group C vs NBS Exon Skip Exclusion in Cancer 2153 1357 1671 Inclusion in Cancer 22615 21453 21438 Exclusion in Normal 3900 4531 4421 Inclusion in Normal 2134 2853 3595 Alternative 3' end Exclusion in Cancer 464 340 403 Inclusion in Cancer 1223 984 1138 Exclusion in Normal 554 610 596 Inclusion in Normal 204 283 252 Alternative 5' end Exclusion in Cancer 245 197 252 Inclusion in Cancer 890 745 788 Exclusion in Normal 447 473 447 Inclusion in Normal 132 162 177 Mutually Exclusive Exon Exclusion in Cancer 295 172 213 Inclusion in Cancer 262 183 212 Exclusion in Normal 375 428 420 Inclusion in Normal 355 401 386 Intron Retention Inclusion in Cancer 843 679 797 Exclusion in Normal 2903 2942 2857 Inclusion in Normal 128 155 114
Supplement Table 7: Annotation of splice events in individual breast cancer samples from, Non- and HER2-positive group and in comparison to normal breast samples using direct exon model comparison Group Samples Type of Splice Events TSS TTS SKIP_ON SKIP_OFF MSKIP_ON MSKIP_OFF 1 100668 42294 159463 159463 37409 37409 2 99272 41917 160030 160030 37839 37839 3 101755 42739 161433 161433 37873 37873 4 99858 41870 159597 159597 37256 37256 5 99818 41961 160041 160041 37888 37888 6 99999 41896 160211 160211 37958 37958 Non- Group Samples Type of splice Events TSS TTS SKIP_ON SKIP_OFF MSKIP_ON MSKIP_OFF Non-1 97004 39179 159305 159305 37569 37569 Non-2 97378 39536 160042 160042 37689 37689 Non-3 97699 39581 159338 159338 37198 37198 Non-4 97783 39579 160122 160122 37728 37728 Non-5 97865 39757 159951 159951 37715 37715 Non-6 96762 38965 160556 160556 37633 37633 HER2-positive breast cancer Samples Type of Splice Events TSS TTS SKIP_ON SKIP_OFF MSKIP_ON MSKIP_OFF HER2_1 96832 38735 159225 159225 37447 37447 HER2_2 96681 38707 159606 159606 37670 37670 HER2_3 96456 38522 159677 159677 37674 37674 HER2_4 97106 38771 159578 159578 37489 37489 HER2_5 96061 38279 161160 161160 37882 37882
Supplement Table 8: Annotation of Novel Splice Events that are common in individual breast cancer after eliminating all the splice events that occur in normal breast samples as well as reference human genome, hg19 using direct exon model comparison Common Splice Events in breast cancer groups after eliminating events similar to hg19 Groups Type of Splice Events TSS TTS SKIP_ON SKIP_OFF MSKIP_ON MSKIP_OFF 10 0 108 103 32 10 Non- 5 3 120 99 38 6 HER2-positive 40 1 148 111 31 7
Supplemental Figure 1: Venn diagram showing the overlapping transcripts that are similar to reference between A. all three breast cancer types and B. normal breast sample (NBS) A 54879 7317 Non- 8345 51755 6680 7965 48972 HER2-positive B HER2-positive 48972 Non- 51755 54879 7965 7317 7021 NBS 6680 5611 13027 4359 8345 5021 5203 4998 6105
density Supplemental Figure 2: CummeRbund plots of the expression level distribution for all genes that are considered from individual experimental conditions shown as the A. csdensity plot B. dendrogram A genes B 0.6 0.5 0.4 0.3 sample_name NBT Non_ HER2 0.2 0.1 0.0 0 1 2 3 4 log10(fpkm)
Supplemental Figure 3: Isoforms associated with statistically significant differentially spliced genes (p-value<0.05) identified through pairwise comparisons of vs NBS (A), non- vs NBS (B), and HER2-positive vs NBS (C).
log10(fpkm) HER2 log10(fpkm) HER2 Non_ NBT log10(fpkm) HER2 Non_ NBT Non_ NBT Supplemental Figure 4: The distributions of A. genes B. primary transcripts and C. Coding sequence FPKM across all four groups shown as csboxplot 3 A B C 4 4 2 2 2 1 0 sample_name NBT 0 Non_ HER2 sample_name NBT Non_ 0-1 sample_name NBT Non_ HER2 HER2-2 -2-2 -3-4 -4-4 sample_name sample_name sample_name
Supplemental Figure 5: FPKM Bins of de novo reassembled transcripts from cufflinks assembler that are classified as novel and reference like using cuffcompare program in A. B. Non- and C. HER2-positive D. Normal breast samples expression transcripts
FPKM + 1 FPKM + 1 FPKM + 1 GRIPAP1 P2RY10 PGK1 ENG DOCK8 TMEM71 ASAP1 DSCC1,TAF2 LMO7,UCHL3 CDK8 GCN1L1 RBM19 UTP20 HSPA8,SNORD14C SLC36A4 CTTN,PPFIA1 TEAD1 BTAF1 KIAA1274 EPHX1,SRP9 PGK1 MED12,NLGN3 ZER1 DPM2,FAM102A NCBP1 DSCC1,TAF2 ENPP2 PARP2 VPS36 RPLP0 GCN1L1 PEBP1 LRRK2 FGD4 HSPA8,SNORD14C NT5C2 PARD3 KIAA1217 MLLT10 CROCC THOC2 NUP62CL,RBM41 DOCK11 ALG13 PGK1 ATP7A ZMYND19 EPB41L4B SPTAN1 VPS13A KIAA1797 PRKDC SORL1 CTTN,PPFIA1 MARK2 FRA10AC1 PARD3 MAPK8 RBM17 INTS3,SLC27A3 A Supplemental Figure 6: The relative abundances (FPKM) of top 20 statistically significant splice genes identified from the pair wise comparison between normal breast samples vs. (A), (B) Non- and (C) HER2-positive breast cancers 10 4 10 3 10 2 10 1 sample_name NBT Non_ HER2 10 0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! B 10 4 10 3 10 2 10 1 sample_name NBT Non_ HER2 10 0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! C 10 3 10 2.5 10 2 10 1.5 10 1 sample_name NBT Non_ HER2 10 0.5 10 0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Supplemental Figure 7: The overlap of statistically significant splice genes identified from the pair wise comparison between (A) normal breast samples vs., Non- and HER2- positive breast cancers and (B) comparison among the cancer subtypes alone. A NBS vs. NBS vs. Non- B NBS vs. NBS vs. Non- NBS vs. HER-2 positive NBS vs. HER-2 positive
Supplemental Figure 8: A. The top panel shows the total number of differentially spliced genes associated known and novel isoforms in (red), non- (green) and HER2-positive (blue) breast cancers. The bottom panel presents the total number of differentially spliced isoforms that are expressed only in, non- and HER2-positive breast cancers. B. Top 20 genes with an abundant novel splice isoform that is not expressed in NBT. The genes are sorted by the highest abundance (FPKMs) for, Non- and HER2. The relative abundance of the novel isoform in the other two breast cancer subtypes is also shown (color coded). The exon models for the novel isoforms are shown in Supplemental file 11. A B
Supplemental Figure 9: Venn diagram showing the overlap among candidates identified from cuffdiff statistical test at the level of pre-mrna (TSS), Splicing (Splice), Promoter usage (Promoter) and Coding sequence (CDS) between the normal breast samples vs. (A) Non- (B) and HER2-positive (C) breast cancers. TSS Splice A B C Non- Non- 170 521 339 52 236 112 181 90 38 172 40 51 548 258 Promoter 109 11 57 5 13 10 150 Non- HER2-positive HER2-positive HER2-positive D CDS 28 47 48 Non- 13 44 10 12 HER2-positive
Supplemental Figure 10. Relative abundance of TFAP2A isoforms
Supplemental Figure 11: The pathway influenced by the differentially splicing genes NBS vs. differentially splicing genes Cell Death, Cellular Function and Maintenance, Cell Cycle 27 out of 35 molecules NBS vs. Non- differentially splicing genes Post-translational modification, digestive system development & function, embryonic development. 27 out of 35 molecules NBS vs. HER2-positive differentially splicing genes Cell Morphology, cellular function and maintenance, embryonic development 27 out of 35 molecules
Supplemental Figure 12: The pathway influenced by the differentially expressing primary transcripts Immunological Disease, Cell to cell Signaling and interaction, cellular movement 30 out of 35 molecules NBS vs. differentially expressing primary transcript genes Tissue morphology, Cell Cycle, Hair & Skin development and function 29 out of 35 molecules NBS vs. Non- differentially expressing primary transcript genes Infectious Disease, renal and urological disease, antimicrobial response 31 out of 35 molecules NBS vs. HER2-positive differentially expressing primary transcript genes
Supplemental Figure 13: The pathway influenced by the genes that are involved in differential promoter usage Cellular development, cell to cell signaling and interaction, hematological system development & function 14 molecules present out of 35 NBS vs. differential promoter usage genes Cell to cell signaling and interaction, connective tissue development and function, Cancer 11 out of 35 molecules NBS vs. Non- differential promoter usage genes NBS vs. HER2-positive differential promoter usage genes Tissue development, Cell death, cell morphology 19 out of 35 molecules
Supplemental Figure 14: Overlap between differential splice, primary transcripts, promoter usage and promoter switching that occur in, Non- and HER2-positive in comparison with NBS. A 58 B 36 Non- Promoter switching 34 14 17 212 22 4 6 119 14 5 DYRK1A MSI2 MLL5 FTO LRBA PHF16 ABCG1 GRIPAP1 SEC15L1 PHF16 ENO1 KIAA0556 AC0272.6 HSPA18 AC092135.1 34 47 16 849 TSS C 132 64 FGD4 NCAPD2 KIAA0664 TIAA1217 SNHG7 HER2-Positive 18 31 10 632 TSS 107 ALDH1N1 CPSF7 TFAP2A HSPA8, SNORD14 FCHO2 FGD4 NCAPD2 KIAA0664 TIAA1217 SNHG7 44 11 15 205 26 55 67 32 757 TSS 135 SDCCAG8 SORL1 SFPQ ZC3H7a DICER1 CASP10 MBD5 CSDA FRYL PPP1R12A ASAP1 DDI2 GPATCH8 LTN1 TRAPPC10 DROSHADHX9 CCDC107 ATP11B INPP4B NF5B IKBKB FBXW7 TFAP2A BRE PHF14 RC3H2 HSPA8 MAMDC2 AC013272.3 FCHO2 TRAF3IP1
Supplemental Figure 15: Exon model of the, Non- and HER2-positive validated novel isoforms Novel Isoform: validated candidate PHLPP2 Non- Novel Isoform: validated candidate LARP1 HER2-positive Novel Isoform: validated candidateadd3
Supplemental Figure 16: Predicted protein domain models of the validated novel hybrid isoforms, PHLPP2 (A), ADD3 (B) and LARP1 (C)
Supplemental Figure 17: Overlap between the mrna sequencing based alternatively splicing genes and the genes identified from the comparative analysis of normal vs. ductal carcinoma in situ or invasive breast cancer Microarray based Normal vs. DCIS resulting differentially splicing genes microarray mrna seq mrna sequencing based Normal vs., Non- and HER2-positive breast cancer differentially splicing genes 7086 408 434 Microarray based Normal vs. IBC resulting differentially splicing genes mrna sequencing based Normal vs., Non- and HER2-positive breast cancer differentially splicing genes microarray mrna seq 7696 374 468