Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families DOI:

Size: px

Start display at page:

Download "Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families DOI:"

Chrystal Lucas
6 years ago
Views:

Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families Jones, R. M. (2017).

4225/23/59f13ee5d5573 Link to publication in the UWA Research Repository Rights statement This work is protected by Copyright.

1 Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families Jones, R. M. (2017). Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families DOI: /23/59f13ee5d5573 DOI: /23/59f13ee5d5573 Link to publication in the UWA Research Repository Rights statement This work is protected by Copyright. You may print or download ONE copy of this document for the purpose of your own non-commercial research or study. Any other use requires permission from the copyright owner. The Copyright Act requires you to attribute any copyright works you quote or paraphrase. General rights Copyright owners retain the copyright for their material stored in the UWA Research Repository. The University grants no end-user rights beyond those which are provided by the Australian Copyright Act Users may make use of the material in the Repository providing due attribution is given and the use is in accordance with the Copyright Act Take down policy If you believe this document infringes copyright, raise a complaint by contacting repository-lib@uwa.edu.au. The document will be immediately withdrawn from public access while the complaint is being investigated. Download date: 28. Apr. 2018

Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families Submitted

2 Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families Submitted by Rachel Jones This thesis is presented for the degree of Doctor of Philosophy The University of Western Australia School of Surgery 2017 i

3 ii

4 Declaration I, Rachel Jones, certify that: This thesis has been substantially accomplished during enrolment in the degree. This thesis does not contain material which has been accepted for the award of any other degree or diploma in my name, in any university or other tertiary institution. No part of this work will, in the future, be used in a submission in my name, for any other degree or diploma in any university or other tertiary institution without the prior approval of The University of Western Australia and where applicable, any partner institution responsible for the joint-award of this degree. This thesis does not contain any material previously published or written by another person, except where due reference has been made in the text. The work(s) are not in any way a violation or infringement of any copyright, trademark, patent, or other rights whatsoever of any person. The research involving human data reported in this thesis was assessed and approved by The University of Western Australia Human Research Ethics Committee. Approval number: RA/4/1/6434. Third party editorial assistance was provided in the preparation of the thesis by Dr Tegan McNab. iii

6 For Gareth and Abbie... [A] knowledge of sequences could contribute much to our understanding of living matter. Frederick Sanger [1980] v

7 vi

8 Abstract Cancer is a genetic disease caused by an accumulation of genetic and epigenetic alterations. Cancers can be caused by mutations that arise in single somatic cells, resulting in sporadic tumours or mutations that occur in the germline, resulting in hereditary predisposition to cancer. While only a small proportion of cancers are estimated to involve an inherited genetic mutation, familial clustering of cancers is relatively common. More than 100 cancer predisposition genes have been identified using a variety of genetic strategies. However, only a small proportion of familial cancer risk can be explained by established cancer susceptibility genes. The identification of genes that predispose individuals to cancer is of high importance in human medical research as inherited genetic variants in genes that metabolise and process drugs can influence response to treatment. Sarcomas are a rare group of cancers that arise predominantly from the connective tissues of the body. Despite representing only 1% of all cancers, sarcomas are a high impact group of cancers that disproportionately affect the young. While it is sometimes difficult to distinguish sporadic from hereditary cancer, rare cancer, such as sarcoma, occurring twice within the one family is epidemiologically striking. The use of whole exome sequencing (WES) in families currently represents an optimal study design for the identification of rare genetic variants involved in the risk of cancer. Families in which multiple members develop a rare form of cancer, such as sarcoma, are more likely to have a mutation segregating in an inherited cancer gene compared to families affected by more common types of cancer. vii

9 In this study, three cancer cluster families (19 individuals) with a sarcoma proband were selected from the International Sarcoma Kindred Study, and WES was performed on germline DNA from both affected and unaffected family members using the Ion Proton platform at 100X coverage. WES data was annotated using Annotate Variation (ANNOVAR) and Regulome database (RegulomeDB). Putative structural and regulatory variants were filtered using genomic location and variant class or RegulomeDB score. Three different strategies were used to prioritise rare private variants, known rare variants and candidate gene variants. Association and segregation analyses of the prioritised variants were used to identify eight nominally significant germline risk variants in the ARHGAP39, C16orf96, ABCB5, ZFP69B, UVSSA, BEAN1, KIF2C and PDIA2 genes that show segregation with cancer in the families. Matched tumour and germline analyses were performed on WES data generated using the Illumina HiSeq 4000 at 60X coverage for two myxoid liposarcoma patients from two of the cancer cluster families. A total of 13 statistically significant somatic mutations were identified using VarScan2 and Strelka (PRMT5, ASPN, LAMA2, TET2, FHOD3, GATAD2A, ADSSL1, P4HTM, ABL1, SLC6A18, PLK2 and two intergenic variants between SLC22A20 and POLA2, and SDR16C6P and PENK). A region of loss of heterozygosity on chromosome 16 was also identified in one of the myxoid liposarcoma tumours. Whole genome sequencing (WGS) of germline DNA using the Illumina HiSeq X Ten platform was available for 561 sarcoma cases and 1,144 healthy ageing controls from the Garvan Institute for Medical Research. Using this WGS data, variant burden analyses were performed independently for summed nonsynonymous deleterious variants and putative regulatory variants to validate target regions identified in the cancer cluster families. The target regions were defined as the genes in which candidate germline and somatic mutations were identified and included 1,000 bases either side. For intergenic variants, both flanking genes were included. Of the 21 regions analysed, six (C16orf96, SLC6A218, TET2, ARHGAP39, ABL1 and a region encompassing SLC22A20 and POLA2 ) were found to have a significantly higher burden of variants in sarcoma cases compared to controls (p-value < 2.38 x 10 3 ). viii

10 The current study was the first to perform WES on cancer cluster families identified by a sarcoma proband. The results indicate the utility of this approach to identify novel sarcoma candidate risk genes by sequencing a small number of mixed cancer cluster families and validating the results in larger population cohorts. Genomic regions identified in this study should be prioritised for further studies to determine the role of these genes in cancer and sarcoma pathogenesis. ix

11 x

12 Contents Declaration Abstract Table of contents List of tables List of figures Acknowledgements Authorship declaration Abbreviations iii vii xi xviii xx xxiii xxv xxvii 1 Literature review Cancer Cancer genetics Familial cancers Familial cancer predisposition syndromes Familial cancer clusters Evidence for pleiotropic genetic risk factors Sarcoma Sarcoma genetics Methods for identifying genetic risk variants Linkage mapping Association DNA sequencing Whole exome sequencing xi

13 1.6.5 Whole exome sequencing of cancer cluster families Next generation sequencing study considerations Known cancer predisposition genes Summary Aims Aim 1: Whole exome sequencing of three cancer cluster families identified by a sarcoma proband Introduction Ion Proton platform Methods Families selected for whole exome sequencing DNA extraction Whole exome sequencing Library preparation Exome sequencing Sequence alignment and variant calling Variation to sequence alignment and variant calling Torrent variant caller plugin Genome analysis toolkit Intersect variant calls from Torrent Variant Caller and Genome Analysis Toolkit Recalibrate variants Genotype concordance Results Families selected for whole exome sequencing Whole exome sequencing Variant calling Recalibrate variants Genotype concordance Discussion Evaluation of families used in this study The use of whole exome sequencing to identify disease causing variants Limitations of whole exome sequencing xii

14 2.4.4 The Ion Proton sequencing platform Base calling software Concordance Aim 2: Identification of candidate germline risk variants in three cancer cluster families Introduction Bioinformatic strategies for variant filtering and prioritisation in whole exome sequencing Annotation Annotation of non-coding regions Variant class filtering Population frequency filtering Evolutionary conservation Functional impact prediction Association analysis in families Familial segregation Outline of chapter Methods Ascertainment bias correction Intersection Annotation and filtration Prioritisation strategies Prioritisation using a rare private variants strategy Prioritisation using a known rare variants strategy Prioritisation using a candidate gene strategy Methods for testing association of variants with cancer phenotypes Bonferroni correction Familial segregation analysis Evidence further supporting candidate risk genes Results Variant prioritisation Prioritisation using a rare private variants strategy Prioritisation using a known rare variants strategy 59 xiii

15 Prioritisation using a candidate gene strategy Summary of annotated variants from each prioritisation strategy Rare private variants Association analysis in SOLAR Segregation analysis results Known rare variants Association analysis in SOLAR Segregation analysis results Candidate gene variants Association analysis in SOLAR Segregation analysis results Evidence further supporting germline risk genes Discussion Variant filtering and prioritisation strategies Association and segregation analyses of candidate risk variants in families The ABCB5 gene The KIF2C gene The PDIA2 gene Conclusion Aim 3: A comparison of matched tumour and germline DNA from two sarcoma patients Introduction Myxoid liposarcoma Somatic variants Loss of heterozygosity Somatic copy number alteration Bioinformatic assessment of matched tumour and germline samples Somatic mutations and drug sensitivity Outline of chapter Methods Whole exome sequencing xiv

16 4.2.2 Pre-processing and quality control Adapter trimming Sequence alignment and calling BAM quality control Generate mpileup file Somatic variant calling using VarScan Somatic variant calling using Strelka Evidence further supporting somatic risk genes Drug sensitivity Loss of heterozygosity variant calling using VarScan Variant annotation and filtering Somatic copy number analysis using VarScan Results Whole exome sequencing Sequence alignment and calling BAM quality control Somatic variant calling VarScan Validation of somatic variants using Strelka Evidence further supporting somatic risk genes Drug sensitivity Loss of heterozygosity variants Copy number analysis Discussion Comparison of results in the context of published literature on myxoid liposarcoma genetics Strengths Limitations Summary Aim 4: Variant burden analyses at candidate risk loci in sarcoma cases and healthy ageing controls Introduction Variant burden analyses in sarcoma cohorts Methods xv

17 5.2.1 Study participants Whole genome sequencing Genomic regions selected for validation Statistical analyses Results Identification of nonsynonymous deleterious variants in the target regions Statistical analyses Nonsynonymous deleterious variants Putative regulatory variants Discussion Novel findings Known cancer genes Clinical implications Strengths and limitations Conclusion Conclusion Summary of results Clinical utility of findings Review of methodology Recommendations for future work Bibliography 149 Appendices 239 A World Health Organisation classification of soft tissue tumours and bone tumours 241 B Novel tumour-predisposing genes identified by whole exome sequencing 251 C Familial cancer syndromes associated with sarcomas 265 D Translocations associated with sarcomas 271 xvi

18 E Genetically complex sarcomas 277 F Known cancer predisposition genes 281 G Candidate genes used for variant prioritisation based on a priori knowledge of cancer biology 289 H Genes in which variants were also prioritised using the candidate gene prioritisation strategy 293 I Patient 1-II-2: Copy number variation by chromosome 297 J Patient 2-II-1: Copy number variation by chromosome 303 K A list of nonsynonymous deleterious variants included in variant burden analyses 309 L Gene identified by variant burden analyses by Ballinger et al. (2016) and Brohl et al. (2017) 315 M A list of putative regulatory variants included in variant burden analyses 319 xvii

19 List of Tables 2.1 Parameters used to create whole exome sequencing run plans using Torrent Suite software Parameters used to run the Torrent Variant Caller plugin to call bases Parameters used for Genome Analysis Toolkit UnifiedGenotyper to call bases Depth of coverage summary from Torrent Suite Genome Analysis Toolkit VariantRecalibrator tranche results Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing for Patient 2-II Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing for Patient 3-III Classification of Regulome database scores Functional annotation of intersect file using ANNOVAR Summary of variant annotation using Annotate Variation and Regulome Database for each prioritisation strategy Summary of SOLAR association results for rare private variants Summary of SOLAR association results for known rare variants Summary of SOLAR association results for candidate gene variants Summary of findings from in silico resources investigating the role of candidate germline risk variants in cancer pathogenesis Summary of search results from PubMed for genes in which germline variants were identified Parameters specified for VarScan2 somaticfilter to filter false positives from the high confidence somatic mutations xviii

20 4.2 Raw data summary from Macrogen Inc. for Patient 1-II-2 and Patient 2-II-1 germline and tumour samples Summary statistics generated using Samtools flagstat for Patient 1-II-2 and 2-II-1 germline and tumour samples Results from VarScan2 somaticfilter to remove possible false positives from the high confidence somatic calls for Patient 1-II-2 and Patient 2-II Somatic variants identified by VarScan2 and Strelka for Patient 1-II Somatic variants identified by VarScan2 and Strelka for Patient 2-II Summary of findings from in silico resources investigating the role of somatic risk variants and the genes in which they arise in cancer pathogenesis Summary of search results from PubMed for genes in which somatic variants were identified Statistically significant high confidence loss of heterozygosity variants for Patient 1-II Genomic coordinates for target regions in which germline and somatic risk variants were identified Classification of Regulome database scores Annotated summary of nonsynonymous deleterious variants and putative regulatory variants in the target regions Odds ratios, p-values and 95% confidence intervals from Fisher s exact test for target regions for nonsynonymous deleterious variants Odds ratios and p-values from Fisher s exact test for target regions for putative regulatory variants xix

21 List of Figures 1.1 Location of known cancer predisposition genes Pedigree of family Pedigree of family Pedigree of family Whole exome sequencing pipeline flowchart The number of variants called by Torrent Variant Caller and Genome Analysis Toolkit UnifiedGenotyper, and the number of variants that were called by both callers (intersect) Genome Analysis Toolkit VariantRecalibrator tranche plot Genome Analysis Toolkit VariantRecalibrator projection for mapping quality rank sum (MQRankSum) versus haplotype score Concordance of genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing on Ion Proton for three patients Genotypes for the ARHGAP39 variant that shows segregation in patients with cancer in family Genotypes for the C16orf96 and ABCB5 variants that show segregation in patients with cancer in family Genotypes for the ZFP69B, BEAN1, UVSSA and KIF2C variants that show segregation in patients with cancer in family Genotypes for the PDIA2 variant that shows segregation in patients with cancer in family Pedigree of family 1 highlighting sarcoma Patient 1-II-2 for tumour-germline comparison Pedigree of family 2 highlighting sarcoma Patient 2-II-1 for tumour-germline comparison xx

22 4.3 Genome analysis toolkit depth of coverage summary for Patient 1-II-2 and Patient 2-II-1 germline and tumour DNA Insert size histogram plots generated by Picard for Patient 1-II-2 and Patient 2-II-1 germline and tumour samples Pedigree of family 1 indicating genotypes for each patient at chr16: (rs ) in the RBL2 gene xxi

23 xxii

24 Acknowledgements I would like to acknowledge support from Mandy Basson and the Board of Directors of the Abbie Basson Sarcoma Foundation Ltd (Sock it to Sarcoma!). I would like to sincerely thank David Thomas, Mandy Ballinger and Mark Pinese, for providing the DNA samples and data used in this thesis. I would also like to acknowledge the participants from the International Sarcoma Kindred Study and the Medical Genome Reference Bank. I would like to express my gratitude to my supervisors Eric Moses, Phillip Melton, David Wood, David Thomas and Evan Ingley, for their guidance and for the opportunity to pursue this project. I would also like to acknowledge Jane Allen and Barry Iacopetta for their support. I would like to thank all my friends at the Centre for Genetic Origins of Health and Disease for your daily guidance and support, especially Alex Rea for his assistance in the lab and Gemma Cadby for her helpful advice and for reading drafts. I would also like to thank Tegan McNab for proofreading my thesis. I am grateful to my family and friends who have always supported my studies. xxiii

25 xxiv

27 xxvi

28 Abbreviations Abbreviation *.bam *.bed *.sam *.vcf ABC Alt ANNOVAR ASPREE ATP ATPase ATRA B BCFtools BWA BWA-MEM Chr CNV COSMIC CpG CREB Definition Binary Alignment/Map Browser Extensible Data Sequence Alignment/Map Variant Call Format ATP-binding cassette Alternate allele Annotate Variation ASPirin in Reducing Events in the Elderly Adenosine Triphosphate Adenosinetriphosphatase All-Trans-Retinoic-Acid Benign Binary Variant Call Format Tools Burrows-Wheeler Aligner Burrows-Wheeler Aligner Maximal Exact Matches Chromosome Copy Number Variation The Catalogue of Somatic Mutations in Cancer 5 C phosphate G 3 camp Response Element-binding Protein xxvii

29 Abbreviation Definition D Deleterious dbsnp Short Genetic Variations Database DNA Deoxyribonucleic Acid DNase Deoxyribonuclease dntp Deoxynucleotide E2F E2 Factor ECM Extracellular Matrix ENCODE Encyclopedia of DNA Elements eqtl Expression Quantitative Trait Loci ER Endoplasmic Reticulum ERbB Erythroblastosis ERK Extracellular Signal-Regulated Kinase ESC Embryonic Stem Cells ExAC Exome Aggregation Consortium FAMMM Familial Atypical Multiple Mole Melanoma FFPE Formalin-Fixed and Paraffin-Embedded GATK Genome Analysis ToolKit GeneRIF Gene References into Functions GERP Genomic Evolutionary Rate Profiling GO Gene Ontology GOHaD Centre for Genetic Origins of Health and Disease GPCR G Protein Coupled Receptor GTP Guanosine Triphosphate GTPase Guanosine Triphosphatase GWA Genome Wide Association HapMap International Haplotype Project HDI Histone Deacetylation Inhibitor hg19 Human Genome build 19 xxviii

30 Abbreviation hmscs IG IGV INDEL Int isec ISKS Kb LOD LOH MAF MGRB MPNST MQRankSum mrna NCBI NGS NS NTR OMIM P PDI PNET PolyPhen-2 Probit Q QC Rb Definition Human bone marrow-derived Mesenchymal Stromal Cells Intergenic Integrative Genomics Viewer Insertions and Deletions Intronic BCFtools Intersect International Sarcoma Kindred Study Kilobase Logarithm of the Odds Loss Of Heterozygosity Minor Allele Frequency Medical Genome Reference Bank Malignant Peripheral Nerve Sheath Tumour Mapping Quality Rank Sum Messenger Ribonucleic Acid National Center for Biotechnology Information Next Generation Sequencing Nonsynonymous Neurotrophins Online Mendelian Inheritance in Man Possibly damaging Protein Disulphide Isomerase Primitive Neuroectodermal Tumour Polymorphism Phenotyping-2 Probability Unit Base Quality Score Quality Control Retinoblastoma xxix

31 Abbreviation Ref RegulomeDB RNA Robo rs ID S SCNA SIFT SLBP SNP SNV SOLAR T TF TMAP TVC UCSC USA UTR UTR3 UTR5 UWA VQSLOD VQSR WES WGS Definition Reference allele Regulome Database Ribonucleic Acid Roundabout family of proteins Reference SNP Identification Synonymous Somatic Copy Number Alteration Sorting Intolerant from Tolerant Stem-Loop Binding Domain Single Nucleotide Polymorphism Single Nucleotide Variant Sequential Oligogenic Linkage Analysis Routines Tolerated Transcription Factor Torrent Mapping Alignment Program Torrent Variant Caller University of California Santa Cruz United States of America Untranslated Region 3 Untranslated Region 5 Untranslated Region The University of Western Australia Variant Quality Score Log-Odds Variant Quality Score Recalibration Whole Exome Sequencing Whole Genome Sequencing xxx

32 Chapter 1 Literature review 1.1 Cancer Collectively, cancers are a diverse spectrum of human diseases with a common progression resulting from the failure to regulate normal cell growth, proliferation and apoptosis. 1 Cancers can arise from any of the cell or tissue types in the human body and are classified accordingly. 2 The most common cancers in adults are carcinomas, (approximately 90% of cancers) 2 which are derived from epithelial cells that line body cavities and glands. 3 Lymphomas and leukaemias arise in the tissue that gives rise to lymphoid and blood cells and account for approximately 8% of human malignancies. 3, 4 Melanomas, retinoblastomas, neuroblastomas and glioblastomas are derived from dividing cells in melanocytes, ocular retina, neurons and neural glia, respectively. 3 Sarcomas arise from the connective tissues such as bones, tendons, cartilage and fat. 2 Cancer is one of the leading worldwide causes of death with over 14 million people affected each year. 5 In 2012, there were 4.3 million premature deaths from cancer with premature deaths expected to increase 44% from 2012 to , 7 The lost years of life and productivity caused by cancer represent the largest cost to the global economy compared to other causes of death. 8 1

33 1.2 Cancer genetics Cancer is a genetic disease arising from an accumulation of genetic and epigenetic mutations. 9 These mutations can deregulate multiple complex regulatory pathways of genes affecting cellular growth, division, migration, and survival. 10 Tumour genomes usually exhibit many mutations and can be highly unstable. 11 Mutations can range from intragenic mutations to large gains and losses of chromosomal material. 9 A genetic mutation is a permanent change in the DNA sequence. A polymorphism is a genetic variation that is common in the population. The arbitrary cut-off between a mutation and a polymorphism is 1%, that is, the less common allele of a polymorphism must have a frequency of at least 1% in the population. 12 Mutations in a cancer genome can comprise the following types of DNA change: substitutions, insertions or deletions of small or large segments of DNA, rearrangements, copy number increases, and copy number reductions. 13 Cancer cells can also acquire new DNA sequences from viruses including human papillomavirus, Epstein-Barr virus, hepatitis B virus, human T-lymphotropic virus 1, and human herpesvirus. 14 Cancer genomes can also acquire epigenetic changes which alter chromatin structure and gene expression. 15 There can be anywhere between tens to thousands of mutations per cancer genome. 16 The substantial variation in the number and pattern of mutations in individual cancers reflects exposure to different risk factors, DNA repair defects, and the cellular origins. 17 Mutations that occur in cancers fall into two functional categories: mutations required for tumourigenesis, and mutations that merely occur during tumourigenesis and do not contribute to the process. These are called driver and passenger mutations, respectively. Drivers confer a selective advantage during clonal evolution and therefore drive the tumourigenesis process. Passenger mutations do not appear in tumours as a result of evolutionary selection, but rather as a variation that occurs by chance in a cell that harbours a driver mutation. It is likely that most cancers carry 2

34 more than one driver mutation, and the number of drivers varies between cancer 13, 16, type. Mutations can arise in three broad categories of genes - oncogenes, tumour suppressor genes, and genome stability genes. Mutations in oncogenes and tumour suppressor genes drive the tumourigenesis process by increasing proliferation or inhibiting apoptosis, respectively, whereas mutations in genome stability genes drive tumourigenesis by increasing the rate of mutations in other genes. 9 The characterisation of these genes has led to the discovery of the biochemical pathways underlying the process of tumourigenesis, and also to a better understanding of the normal homeostatic roles these pathways play in healthy cells and tissues. 21 Mutations in these three classes of genes can occur in single somatic cells, resulting in sporadic tumours, or in the germline, resulting in hereditary predisposition to cancer. Sporadic cancers develop due to mutations that arise during a person s lifetime. The majority of cancers (90-95%) develop sporadically due to genetic mutations that result from DNA damage from exposure to environmental and lifestyle factors. 22 Environmental risk factors include occupational exposures (chemicals, dust, and industrial processes), sunlight, radiation, and environmental pollution. 23 Lifestyle factors that may increase the risk of developing cancer include smoking, excessive alcohol consumption, poor diet, obesity and physical inactivity, chronic 24, 25 infections, sun tanning, and sunburn. Only a small proportion (5-10%) of cancers are estimated to involve an inherited genetic mutation However, familial clustering of cancers is relatively common. 31 Familial clustering is the occurrence of a disease, such as cancer, in some families more than what would be expected from the presence in the general population. 32 Familial clustering of cancer can be measured by familial proportion (the proportion of cases with an affected relative), which has been reported as high as 20% in prostate cancer. 33 Familial clustering of cancers is likely due to a combination of environmental factors, rare gene mutations with high penetrance and more common, lower penetrant gene variants that act together to increase cancer 32, susceptibility. 3

35 1.3 Familial cancers All cancers, both rare and common, show some degree of familial clustering. 37 Cancers can be two- to four-fold more common in first degree relatives of individuals with cancer Familial cancer predisposition syndromes Familial clustering of cancers can sometimes represent a familial cancer predisposition syndrome. A familial cancer predisposition syndrome manifests when multiple members of a family inherit gene mutations that predispose them to one or more types of cancer. 39 These families have multiple affected individuals, and family members often show early onset of cancer, multiple primary sites of disease, and occasionally bilateral involvement of paired organs. 35, 39 Some cancer predisposition syndromes appear to confer an increased risk of adult-onset cancers, such as breast, ovarian and colorectal cancers Other syndromes increase the susceptibility of tumour onset in childhood, such as hereditary retinoblastoma, 43 or early onset in both children and adults, such as von Hippel-Lindau disease. 44 Most familial cancer predisposition syndromes are transmitted in a Mendelian autosomal dominant manner. 35, 45 Dominant mutations require only one defective allele to be present for the individual to be predisposed to cancer. Individuals with one defective and one normal allele are heterozygous. An example of a Mendelian autosomal dominant cancer predisposition syndrome is hereditary breast-ovarian cancer. 46 This syndrome is caused by mutations in the BRCA1 and BRCA2 genes. 47 Women with germline mutations in BRCA1 have a 46 65% risk of developing breast cancer by age 70, while those with a BRCA2 mutation 40, 48 have a lower risk of 43 45% by age 70. Less often, familial cancer predisposition syndromes can be transmitted in an autosomal recessive manner. In the case of recessive mutations, both alleles must be mutated for the individual to have a predisposition to cancer. Individuals who inherit a recessive germline mutation in a gene are known as carriers and carry the mutation in every cell of their body. There is a variable risk that a carrier will develop cancer. A carrier will not develop cancer unless the remaining 4

36 normal allele is also mutated. The particular mutation, other genes, and dietary, lifestyle and environmental factors can influence risk. 49 The likelihood that a carrier will develop cancer is defined as the penetrance of the mutation. 3 An example of an autosomal recessive cancer predisposition syndrome is xeroderma pigmentosum complementation group A, characterised by increased sensitivity to sunlight with the development of carcinomas at an early age. 50 Xeroderma pigmentosum complementation group A has been associated with homozygous or compound heterozygous mutations in the XPA gene. 50 The study of familial cancer predisposition syndromes has led to the identification of genes critical to carcinogenesis and has also informed our understanding of the fundamental biology of human cancer. 51 Li and Fraumeni (1969) described the first familial cancer syndrome in four unrelated children with sarcoma and other affected family members. 52 They hypothesised that the occurrence of various malignancies in a family might represent a familial cancer syndrome due to the transmission of an autosomal dominant gene mutation. 52 In 1990 the TP53 gene was identified as the underlying gene responsible for Li-Fraumeni syndrome. 53 The TP53 gene encodes a tumour suppressor protein that responds to diverse cellular stresses to regulate expression of target genes, thereby inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism Germline mutations in the TP53 gene were later established to also be the underlying genetic cause for many other malignancies Familial cancer clusters There are also familial cancer clusters that are not defined by known hereditary cancer syndromes. Familial cancer clusters are those that do not exhibit the features of hereditary types of cancer but occur in more individuals in the family than statistically expected. 36 In addition to familial clustering for the majority of specific cancers, aggregation of different types of cancers in families has also been observed. For example, individuals with BRCA1 and BRCA2 mutations, have not only increased susceptibility to breast and ovarian cancers, but also colon, cervix, uterus, pancreas, and prostate cancers. 57 5

37 1.4 Evidence for pleiotropic genetic risk factors Early studies assessed the discordant clustering of cancer in families to determine if there was a general susceptibility to cancer. Case-control, registry- and population-based studies have evaluated familial clustering using risk ratio and kinship coefficient estimations. 58 Identifying shared genetic associations between diseases (pleiotropy) is a useful approach to identify new risk loci, and may elucidate common aetiologies and help in risk prediction. 59 The largest studies using the Utah Population and Cancer Registry Database and the Swedish Family-Cancer Database demonstrated excess familial clustering at almost every cancer site in the body. 34, However, these studies focused on familial clustering exclusively in nuclear families, therefore, they were not able to separate the role of shared environmental and genetic factors in the familial aggregation of cancer. A more extensive study by Cannon-Albright et al. (1994) used the Utah Population Database to evaluate familial clustering for more distant relatives. 60 As familial risk can be due to shared exposure to an environmental risk and/or a common genetic mutation, examination of familial clustering in near and distant relatives is useful. In more distant relationships, shared familial environment might be less likely, and the probability of shared genotypes can be measured. 60 This study found that there was significant clustering of cancer outside the nuclear family for cancer sites. 60 These results support the hypothesis of an inherited basis to cancer of almost all sites and support the existence of more than one susceptibility locus for some cancers. 60 In support of this finding, a study by Amundadottir et al. (2005) analysed familial aggregation of cancer in extended families from Iceland to search for genetic factors that contribute to cancer at one or more sites in the body. 58 The authors found that most cancer sites demonstrated a significantly increased risk for the same cancer beyond the nuclear family. 58 They also found significantly increased familial clustering between different cancer sites in both close and distant relatives. 58 Therefore, Amundadottir et al. concluded that genetic factors are involved in the aetiology of many cancers and that these factors are in some cases shared by different cancer sites. 58 These findings support the conclusions by Cannon-Albright et al. However, shared environment or non-random mating for certain risk factors 6

38 also play a role in the familial clustering of cancer. 58 Several types of study designs can be used to identify genetic risk variants that may be involved in the aetiology of cancers. 1.5 Sarcoma Sarcomas are a rare group of cancers that arise predominantly from the embryonic mesoderm (the connective tissues of the body), for example, bones, muscles, cartilage and fat. There are over 70 different subtypes of sarcoma that are grouped into two broad classifications of bone or soft tissue (Appendix A). 64 The majority of sarcomas arise in the soft tissue, while malignant bone tumours make up just over 10% of all sarcomas. 65 Soft tissue sarcomas are often further sub-categorised by the line of differentiation, for example, liposarcoma (fat), leiomyosarcoma (smooth muscle), rhabdomyosarcoma (skeletal muscle) and fibrosarcoma (connective tissue). 66, 67 Bone tumours are further classified into bone-forming tumours, cartilage-forming tumours, marrow tumours, or vascular tumours. 68 It can be difficult to diagnose and classify this diverse group of malignancies with overlapping histological features. However, it is important to correctly determine the specific 67, 68 histologic subtype for management and treatment decisions. Sarcomas are a high impact group of cancers that disproportionately affect the young. Although sarcomas are rare, they contribute significantly to the burden of disease as they tend to affect teenagers and young adults. 69, 70 Sarcomas represent only 1% of all cancers in adults but represent 10% of cancers in children and 8% of cancers in adolescents and young adults. 71 There are approximately 800 new sarcoma cases in Australia each year Sarcoma genetics There is evidence to suggest a strong genetic basis to sarcomas. First, sarcomas disproportionately affect the young, with early age at diagnosis associated with 73, 74 a genetic basis for many heritable diseases, including hereditary cancers. Second, sarcomas are over-represented among survivors of melanoma, breast cancer, thyroid cancer, Hodgkin s lymphoma, and leukaemias. 75 Third, sarcoma survivors are at increased risk of secondary cancers. 76 Finally, several rare genetic 7

39 syndromes are associated with sarcomas such as Li-Fraumeni syndrome. 35 Appendix C contains a summary of hereditary syndromes associated with sarcoma including genes and genomic locations. In addition to sarcomas being associated with familial cancer predisposition syndromes, 52, sarcomas also show evidence of familial clustering. Up to 33% of paediatric sarcomas are estimated to be associated with a significant family history of cancers. 80 The risk of sarcomas is increased six-fold in relatives of children with sarcoma compared to age-matched controls. When a causal gene mutation is identified, this risk increased to over 250-fold. 81 Whilst some sarcomas are associated with familial inherited predisposition, most sarcomas do not have a known cause. Very little is currently known about the causes of sarcoma because they are so rare. 65 Several risk factors have been associated with sarcomas including ionising radiation, 82, 83 viruses (Epstein-Barr virus 84 and Kaposi s sarcoma-associated herpes virus 85 ), occupation, exposure to chemicals, hormones, 97, 98 antibiotics, 99 medications for nausea used during pregnancy, 100 use of antibiotics in babies, 101 birth weight, 102 gestational age, , 105 birth order, and maternal age. Sarcomas that arise due to somatic mutations are classified into two main groups based on genetics: 1. Sarcomas with specific recurrent genetic mutations on a background of relatively few other chromosomal changes 2. Sarcomas with no specific genetic mutations on a complex background of numerous chromosomal changes Approximately one-third of all sarcomas have specific recurrent genetic mutations. 106 These tumours either contain disease-specific chromosome translocations or specific activating mutations. Most sarcomas with specific recurrent genetic mutations are characterised by balanced or reciprocal translocations (the exchange of pieces between two chromosomes), resulting in two derivative chromosomes with no net gain or loss of chromosomal 8

40 material. 107 In some cases, only one derivative chromosome is formed, and some genetic material is lost. The fusion proteins produced as a result of the translocation can contribute to oncogenesis by increasing cell proliferation, promoting anchorage-independent cell growth, overriding cell contact adhesion, inhibiting apoptosis, enhancing invasion and suppressing terminal differentiation. 107 Appendix D contains a table of known translocations that have been associated with sarcoma. The remaining sarcomas in the specific recurrent genetic mutations group are characterised by specific activating mutations. 108 These tumours show some degree of aneuploidy, but generally, have less disordered karyotypes than the complex group of sarcomas. 83 An example of a sarcoma subtype with a specific activating mutation is gastrointestinal stromal tumours which have activating 109, 110 mutations in KIT or PDGFRA. The remaining two-thirds of sarcomas have highly complex unbalanced karyotypes lacking specific genetic translocations. 66, 111 This group is mostly composed of spindle cell or pleomorphic sarcomas including leiomyosarcoma, myxofibrosarcoma, pleomorphic liposarcoma, pleomorphic rhabdomyosarcoma, malignant peripheral nerve sheath tumour, angiosarcoma, extraskeletal osteosarcoma and spindle cell/pleomorphic unclassified sarcoma (previously known as spindle cell/pleomorphic malignant fibrous histiocytoma). 111 These neoplasms show gains and losses of many chromosomes or chromosome regions and amplifications. 111 Many of them share recurrent aberrations (such as the gain of 5p13-p15) that play a significant role in tumour progression or metastatic dissemination. 111 Appendix E lists the genomic regions identified in complex sarcomas. 1.6 Methods for identifying genetic risk variants Several study designs can be employed to identify genetic risk variants. Each study design is suited to identifying different types of mutations from highly penetrant genes in rare Mendelian disorders to low-penetrant variants in more common disease, and rare variants. 9

41 1.6.1 Linkage mapping Linkage mapping in families has been used with success in localising highly penetrant disease-causing genes (e.g., BRACA1 and BRACA2 ) and, in particular, those involved in rare Mendelian human diseases (e.g., Online Inheritance In Man (OMIM), Linkage analysis in families is a form of positional cloning and makes no underlying assumptions about the nature of the genes involved. In human disease studies, the aim of linkage mapping is first to determine the chromosomal location of putative risk genes by identifying polymorphic DNA markers that cosegregate with a disease of interest. The genes in such linkage regions are referred to as positional candidate risk genes. These genes are then prioritised for further genetic and molecular analyses to identify the specific causal mutations or polymorphisms. Linkage mapping has been used to identify highly penetrant susceptibility alleles associated with Mendelian familial cancer predisposition syndromes However, these variants explain only a small fraction of the genetics of all cancer cases. For example, inherited mutations in the BRCA1 and BRCA2 genes account for approximately 2% 3% of all breast cancer cases. 118, 119 However, more prevalent founder mutations in these genes can explain up to about 10% of the disease in 47, some populations Association It has been postulated that more common cancers that do not show a clear pattern of inheritance are caused by many genes that confer a small risk of disease. 124 For disease risk genes of small effect, association studies can be more powerful than linkage studies. 125 With the advent of dense panels of single nucleotide polymorphism (SNP) markers and high-throughput technology for efficiently genotyping them in thousands of individuals, the genome wide association (GWA) analysis study design was subsequently adopted widely for the genetic analysis of common human diseases. GWA is also a form of positional cloning and relies on linkage disequilibrium, the non-random association of alleles at different loci that is a function of population history. 10

42 GWA studies have identified thousands of lower penetrance risk variants for common human traits and diseases typically with small effect size (odds ratio between 1.1 and 1.5). 126 Lower penetrance genetic variants associated with non-familial syndrome breast cancer confer slight risk alterations (odds ratio of approximately 1.2), 127 compared to the high penetrance variants in BRCA1 and BRCA2 identified by linkage with odds ratios between 2 and Most genetic cancer risk variants identified so far confer relatively small increments in risk and explain only a small proportion of familial clustering. 128 The inability of the risk variants detected by GWA studies to account for much of the heritability of most common disorders, missing heritability, has led to an emerging view that rare variants with larger effect sizes could be responsible for a substantial proportion of genetic risk for complex human disease. 129 Significant advances in genome sequencing have now offered the possibility of using this technology as an alternative study design to GWA studies for the detection of rare genetic risk variants DNA sequencing DNA sequencing analysis is the process of determining the precise order of nucleotides in a given DNA sample. One aim of DNA sequencing is to identify genomic variations and to associate those changes with human disease. A breakthrough in DNA sequencing technology was the development of Sanger s chain termination method. 130 In this approach, sequencing occurs by the selective incorporation of a single chain-terminating dideoxynucleotide by DNA polymerase. 130 For approximately 40 years Sanger sequencing was the most widely used approach. Since the completion of the Human Genome Project in 2003, there have been substantive developments in Next Generation Sequencing (NGS) technologies. Whereas the first human genome took 13 years and several billion dollars to complete, a human genome can now be sequenced in a day for $1,000 US (at 20X coverage). The speed of sequencing has increased as NGS enables the simultaneous detection of multiple mutations in multiple genes by the parallel sequencing of millions of different DNA fragments. 131 The development of affordable and efficient next generation DNA sequencing technologies has now provided a new 11

43 study paradigm to search for rare risk variants involved in common, complex diseases. The impact of NGS technology on the discovery of genetic variants in human disease has been profound. Since the introduction of NGS there have been enormous advances in speed, read length and throughput of sequencing studies. 132 The advent of NGS has allowed the inquiry of nearly every base in the genome. 133 The growth in cancer genomics discovery has been unprecedented; knowledge of genes frequently mutated in cancer has grown from four genes in 2004 to over 600 genes listed in the Catalogue of Somatic Mutations in Cancer (COSMIC) currently (v79, released 14-NOV-16). 134 Initiatives such as the Cancer Genome Atlas 135 and the International Cancer Genome Consortium, 136 have employed NGS strategies to characterise tumour genomes and provide multi-platform data for thousands of tumours from a variety of cancer types and subtypes. 137 Typical NGS applications include DNA sequencing, RNA sequencing (to measure gene expression changes to discover new transcripts), chromatin immunoprecipitation sequencing (to detect genome wide transcription factor binding sites and chromatin-associated modifications) and methylation sequencing (to profile various types of DNA methylation). 138, 139 Next generation DNA sequencing can be used for whole genome sequencing (WGS), whole exome sequencing (WES), or sequencing of a specifically targeted region of the genome. 138 The NGS workflow consists of multiple steps including library preparation and enrichment, sequencing, base calling, sequence alignment and variant calling Whole exome sequencing WES involves sequencing only the protein-coding region of the genome. The human exome makes up approximately 1% of the human genome. However, the majority (85%) of disease-causing mutations in Mendelian disorders are expected to arise in the exome. 140 Therefore WES is a cost-effective initial strategy to identify disease-causing variants. In the last decade, WES of unrelated individuals or families with multiple affected members a rare disorder has identified the genetic basis of diseases such as Freeman-Sheldon syndrome, Kabuki syndrome, Miller syndrome, and autosomal dominant spinocerebellar ataxia WES 12

44 studies have also identified more than 50 novel tumour-predisposing genes, listed in Appendix B Whole exome sequencing of cancer cluster families While WES has been used with great success to identify novel tumour predisposing mutations, only one published study has used WES to identify pleiotropic genetic risk variants that predispose families to more than one type of cancer. The recent WES study by Thutkawkorapin et al. (2016) utilised NGS technology to investigate a family with a dominant cancer syndrome with a high risk of both rectal and gastric cancer. 147 The authors hypothesised that the mixed representation of rectal and gastric cancer among family members was due to one predisposing mutation in one gene. 147 The authors performed WES in three family members, two with rectal cancer and one with gastric cancer, and followed up with WES and Sanger sequencing in additional family members, other patients and controls. 147 Thutkawkorapin et al. identified 12 novel nonsynonymous single nucleotide variants (SNVs) shared among five affected members of this family. The authors suggested that at least five of the 12 variants may be candidates that contributed to the disease in the family. 147 These variants did not segregate in other families and are therefore unlikely to be highly penetrant variants. 1.7 Next generation sequencing study considerations NGS technologies can be used to identify rare variants in tumour or germline DNA that increase an individual s susceptibility to developing cancer. 133 It is essential to compare tumour DNA with matched germline DNA to determine somatic and germline alterations in cancer. 148 Germline variants exist in the normal germline sequence. 149 Somatic variants are those in the tumour sequence but not in the normal germline sequence. 149 The ability of NGS to detect somatic variants depends on the variant frequency within the tumour sample, sample contamination, tumour heterogeneity, sequencing error, and the scarcity of somatic 150, 151 mutations within a genome. Recently there has been a return to family-based designs to identify rare risk variants involved in common human disease, based on the hypothesis that affected 13

45 133, members of the same family will carry the same rare susceptibility variant. Therefore, the number of individuals needed for rare variant discovery is potentially smaller than in cohorts of unrelated individuals. 133 Families used in these types of studies to identify rare inherited variants can either be consanguineous families, or non-consanguineous, large multigenerational and multiplex pedigrees. 133 Targeted sequencing technologies have been used to successfully identify new causal genes in hereditary non-polyposis colon cancer and familial adenomatous polyposis, 156 and hereditary breast and ovarian cancers. 157 Two-phase NGS family study designs are recommended. In the first phase, family members are sequenced, and the discovered variants are ranked according to their likelihood of being associated with the trait. 158 In the second phase, the variants are tested for association in an independent population-based sample. 158 Families used to study cancer clustering should be selected carefully. Suitable families have multiple affected and unaffected individuals from two or more generations available for analysis. 49 Families in which various members develop a rare form of cancer, such as sarcoma, are more likely to have a mutation segregating in an inherited cancer gene compared to families affected by more common types of cancer, for example, adenocarcinomas of the lung, breast, prostate, and colon. 49 Therefore ideal families for genetic studies of familial clustering of cancers are those with multiple generations of affected and unaffected individuals and families with multiple cases of a rare form of cancer such as sarcoma. 1.8 Known cancer predisposition genes Over the last 30 years, more than 100 cancer predisposition genes have been identified using a variety of strategies. 134, Figure 1.1 shows the location of known cancer predisposition genes and a full list of known cancer predisposition gene is available in Appendix F. However, only a small proportion of familial 38, 162 cancer risk can be explained by established cancer predisposition genes. The use of family-based NGS strategies in this field may facilitate the discovery of rare genetic mutations that explain the remaining genetic risk for cancer predisposition if much of the missing genetic control is due to gene variants that are too rare to be picked up by GWA studies and have relatively large effects on risk. 14

46 1.9 Summary There have been a substantial number of studies performed to identify genetic risk variants associated with cancer. Linkage studies have identified high penetrant risk alleles associated with Mendelian autosomal dominant cancer predisposition syndromes. Association studies have been used to successfully identify lower penetrant variants associated with more common types of cancer. However, much of the heritability of cancer remains unexplained. The introduction of NGS technology has allowed the identification of rare variants that are expected to explain some of the missing heritability of cancer. Study considerations for using NGS in cancer research include sequencing both tumour and germline DNA to facilitate the differentiation of somatic and germline mutations and to use a family-based study design with multiple generations of affected and unaffected individuals and families with multiple cases of a rare form of cancer such as sarcoma. To date, there have been few studies on shared genetic risk factors in cancer cluster families that are not defined by a known familial cancer predisposition syndrome. This study will employ the approach of performing WES in cancer cluster families of mixed cancer types. WES will be conducted on both affected and unaffected individuals from cancer cluster families that have been identified by a sarcoma proband to identify rare cancer predisposing variants. Only one previous study has used NGS technology to investigate shared genetic risk variants across multiple cancer types. 147 This study will be the second WES study performed on cancer cluster families to identify shared genetic risk variants, and the first WES study to select cancer cluster families by a sarcoma proband. 15

47 1 p36.22 p34.3 p32.1 p22.3 p13.2 q21.1 q23.2 q31.1 q41 q43 2 p25.2 p23.1 p16.1 p11.2 q12.3 q21.3 q24.2 q32.2 q35 3 p25.3 p22.2 p21.1 p11.2 q13.2 q22.3 q26.1 q28 4 p16.1 p15.1 q12 q21.3 q25 q31.1 q p15.31 p13.2 q12.1 q14.2 q22.2 q31.2 q34 6 p24.3 p21.32 p11.1 q14.3 q22.1 q p22.1 p14.2 p11.1 q21.12 q31.1 q33 8 p23.1 p11.23 q12.2 q21.2 q p23 p13.2 q13 q22.2 q p15.1 p11.23 q11.23 q23.1 q p15.3 p13 q12.3 q14.1 q p13.31 p11.1 q13.3 q21.33 q p12 q12.2 q14.3 q p12 q11.2 q22.1 q p12 q12 q21.1 q p13.2 p11.1 q21 17 p13.2 q11.2 q p11.23 q12.1 q p13.2 q11 20 p12.3 q p12 q p12 q11.23 X p22.32 p21.2 q11.1 q21.31 q24 q27.3 Y p11.2 q Indicates position of known cancer predisposition gene Figure 1.1: Location of known cancer predisposition genes 16

48 1.10 Aims The identification of genes that predispose individuals to cancer is a high priority in human medical research. It is anticipated that this knowledge will drive a new era of personalised human medicine, potentially allowing tailoring of specific drug treatments and interventions. The use of NGS in families currently represents an optimal study design for the identification of rare genetic variants involved in the risk of cancer and other common complex human diseases. Waves of novel genetic discoveries using this approach are now regularly appearing in the literature. While it is sometimes difficult to distinguish sporadic from hereditary cancer, rare cancer, such as sarcoma, occurring twice within the one family is epidemiologically striking. 163 The identification of genetic risk factors for cancer will be a significant contribution to medicine and particularly in the provision of health care to cancer patients and their families. The aims of this study are: 1. To perform WES on three cancer cluster families identified by a sarcoma proband using peripheral blood samples. 2. To identify candidate germline risk variants by prioritising and filtering structural and regulatory variants that segregate with cancer or sarcoma in the three families. 3. To perform a matched tumour and germline analysis on two myxoid liposarcoma patients using peripheral blood genomic DNA and genomic DNA isolated from sarcoma tumour tissue to distinguish somatic mutations. 4. To validate the most significant putative germline and somatic cancer predisposing mutations in unrelated sarcoma cases and cancer-free controls. 17

49 18

50 Chapter 2 Aim 1: Whole exome sequencing of three cancer cluster families identified by a sarcoma proband 2.1 Introduction Next Generation Sequencing (NGS) has provided tremendous insight into the genomic landscape of several tumour types, including defining tumour subtypes, identifying new druggable targets and understanding into the heterogeneity of many tumours. 164, 165 Protein-coding genes constitute approximately 1% of the human genome but harbour nearly 85% of the disease-causing mutations of 140, Mendelian diseases, although this may be due to ascertainment bias. Genetic variations discovered in coding regions of genes may inform immediate treatment choices and also further other therapeutic discoveries. 170, 171 Therefore, exome sequencing is an efficient approach for identifying actionable variants. The first aim of this study was to perform whole exome sequencing (WES) in three cancer cluster families ascertained from an index sarcoma patient. 19

51 2.1.1 Ion Proton platform The Ion Proton platform from Thermo Fisher Scientific is a benchtop semiconductor-based sequencing system for the human genome, exome or transcriptome sequencing. Semiconductor sequencing is based on the detection of hydrogen ions that are released during the polymerisation of DNA using a sequencing-by-synthesis approach. 172 The Ion Proton sequencing chemistry uses native deoxynucleotides (dntps) and electronic sensors to detect the release of hydrogen atoms as the dntps are incorporated into the growing DNA strand. 173 Microwells are sequentially flooded with each dntp to distinguish the order of each nucleotide. 173 Homopolymer runs are detected by the magnitude of the ph change to determine how many nucleotides were added. 173 Errors on the Ion Proton are mostly due to insertions and deletions in homopolymer runs due to the difficulty in evaluating the magnitude of signal when several dntps are incorporated in one cycle. 174 Automated sequencing analysis occurs using the Torrent Suite software that is preinstalled on the Torrent Server. The web-based interface can be used to plan, monitor and view the results of sequencing runs. The Torrent Suite base calling algorithm converts the raw file information into a sequence of bases and writes the sequence to an unaligned Binary Alignment/Map (*.bam) file. The *.bam file is then aligned using Torrent Mapping Alignment Program (TMAP). Variants are called using the Torrent Variant Caller (TVC). Both TMAP and TVC were developed specifically for Ion Torrent data and were used in this chapter. 20

52 2.2 Methods Families selected for whole exome sequencing The patients were recruited from the International Sarcoma Kindred Study (ISKS). The ISKS was initiated in 2008 to investigate the prevalence and nature of heritable risk in sarcoma populations. 175 The ISKS is a global genetic, biological, epidemiological, and clinical resource for researchers to investigate the hereditary characteristics of sarcoma. Patients were recruited from several sites across Australia, France, New Zealand, India, the United States of America, the United Kingdom, and Canada. The ISKS Steering Committee granted access to the database for this study under an ethically approved protocol (the University of Western Australia (UWA) Human Research Ethics Committee RA/4/1/6434). Patients with sarcoma (probands) were recruited from major sarcoma treatment centres, regardless of their family history of cancer. Individuals with adult-onset sarcoma (> 15 years old) were eligible for the ISKS. Family members were also invited to participate if the patient with sarcoma was < 45 years of age, or there was a significant family history of cancer. 175 Study questionnaires containing demographic, medical, epidemiological and psychosocial information were completed, including personal history of cancer or exposure to known risk factors for sarcoma. 176 Patients were also asked to donate a venous blood sample and tumour sample, as well as provide access to medical information and access to information about deceased relatives (collected from cancer registries and other health organisations). Medical history and treatment records were obtained for each proband where possible. 176 All reported cancer diagnoses were independently verified by medical records, Australian and New Zealand cancer registries or death certificates. There are now more than 1,300 families enrolled in the ISKS with detailed pedigree information and cancer incidence verified for each. More than 1,800 blood samples have been collected and approximately 2,100 questionnaires completed. The average age at onset for sarcoma in the ISKS cohort is 46.6 years (range 3-95 years) with the majority being sarcomas of soft tissue. Family members have reported over 2,000 other cancers. The average age at diagnosis for these other cancers is 57.9 years compared to 65.6 years in the general population

53 Since the establishment of the ISKS, several studies have focused on identifying TP53 germline mutations in Li-Fraumeni and the less stringent Li-Fraumeni-like syndrome in the cohort. 163, 176, 177 A previous study found pathogenic TP53 mutations in blood DNA of 20 of 559 sarcoma probands (3.6%) in the ISKS cohort. 176 The study of familial cancer cluster patterns in the ISKS identified 14% of the ISKS families with patterns of familial clustering without conforming to any known syndrome. 163 A more recent study using the ISKS discovered that more than half of the sarcoma patients had an excess of putatively pathogenic monogenic and polygenic germline variation in known and novel cancer genes using a case-control rare variant burden test. 178 The combination of findings that 14% of cancer cluster families in the ISKS do not conform to known syndromes and the excess of rare monogenic and polygenic germline mutations in more than half of the ISKS patients indicate the potential utility of this cohort to identify novel genetic risk factors for sarcomas and cancer clustering in families. Three ISKS families that do not conform to known cancer syndromes were targeted for selection in the current study and represented a unique opportunity to identify novel variants that may influence sarcoma or cancer development. These three families were selected for the current study based on the following selection criteria: The sarcoma proband must have blood and tumour biospecimens available The pedigree must contain a first degree relative with cancer also with germline samples available The pedigree must contain at least one unaffected relative with germline material available, and The family is not defined by TP53 or other known familial cancer susceptibility genes Family 1 (Figure 2.1) depicts a proband (Patient 1-III-1) who developed Ewing s sarcoma at 15 years of age, as well as a non-identical twin brother (Patient 1-III-2) who has not developed sarcoma. The proband s father (Patient 1-II-2) 22

54 developed myxoid liposarcoma at 39 years of age. Germline DNA was available from the proband and father, and from the proband s twin brother, mother (Patient 1-II-3), an aunt (Patient 1-II-1) and grandparents (Patient 1-I-1 and Patient 1-I-2), who were all unaffected by cancer. Family 2 (Figure 2.2) was identified by a proband (Patient 2-II-2) who developed myxoid liposarcoma at 61 years of age. The proband s father (Patient 2-I-1) developed prostate cancer at 71 years old, and two of the proband s sisters were diagnosed with skin melanomas at 44 (Patient 2-II-3) and 46 (Patient 2-II-2) years of age. Germline DNA was available for the proband, one of his unaffected children (Patient 2-III-1), three of his sisters (including an unaffected sister, Patient 2-II-4), and his parents (Patient 2-I-1 and Patient 2-1-2). In family 3 (Figure 2.3), there are two individuals with sarcoma; the proband (Patient 3-III-1) who developed a primitive neuroectodermal tumour (PNET) at 22 years of age, and her grandmother (Patient 3-I-1) who developed malignant peripheral nerve sheath tumour (MPNST) at 79 years old. The proband s father (Patient 3-II-1) was diagnosed with prostate cancer at 51 years of age, and the proband s aunt developed breast cancer at age 36. Germline DNA was available from the proband, her parents (Patient 3-II-1 and Patient 3-II-2), her unaffected brother (Patient 3-III-2), and her grandmother. 1-I-1 1-I-2 Key Affected male Affected female Unaffected male 1-II-1 1-II-2 Sarcoma (39) 1-II-3 Unaffected female Proband 1-III-1 Sarcoma (15) 1-III-2 Figure 2.1: Pedigree of family 1 23

55 2-I-1 2-I-2 Prostate (71) Key Affected male Affected female Unaffected male 2-II-1 Sarcoma (61) 2-II-2 Melanoma (46) 2-II-3 2-II-4 Melanoma (44) Unaffected female Proband 2-III-1 Figure 2.2: Pedigree of family 2 3-I-1 Sarcoma (79) Key Affected male 3-II-1 Prostate (51) 3-II-2 Affected female Unaffected male Unaffected female 3-III-1 Sarcoma (22) 3-III-2 Proband Figure 2.3: Pedigree of family 3 24

56 2.2.2 DNA extraction DNA extraction was performed by researchers at the Peter MacCallum Cancer Centre in Melbourne, Australia. Anti-coagulated blood was processed using a Ficoll gradient. DNA was extracted from the nucleated cell product using QIAamp DNA blood kit (Qiagen) Whole exome sequencing WES was performed by the candidate at the Curtin University - UWA Centre for Genetic Origins of Health and Disease (GOHaD). Two germline samples from Patient 3-I-1 and Patient 3-III-2 were badly degraded and of poor quality. Therefore, whole genome amplification was performed on these samples using a Qiagen REPLI-g Mini Kit (Qiagen) as per the manufacturer s instructions. Exome library preparation was performed using the Thermo Fisher Scientific Ion AmpliSeq Exome RDY Kit as per the manufacturer s instructions. Libraries were loaded onto the Ion P1 v2 BC Chip (Thermo Fisher Scientific) using the Ion Chef and sequenced on the Ion Proton as per the manufacturer s instructions. An overview of the WES pipeline is shown in Figure Library preparation The target regions were amplified using the Ion Ampliseq Exome RDY Library Preparation from 100 ng of genomic DNA in the Ion Ampliseq Exome RDY plates and the Ion Ampliseq HiFi Mix. The amplicons were treated with FuPa reagent to digest the primers partially and to phosphorylate the amplicons. The amplicons were then ligated to Ion Xpress Barcode Adapters, purified and dissolved in 50 µl of Low TE. Validation of enrichment and quantification of target DNA were performed on the ViiA 7 (Thermo Fisher Scientific). Three 10-fold dilutions of Escherichia coli control library were prepared at 6.8 pm, 0.68 pm and pm. 9 µl of each control library and each sample were added to wells of a 96-well qpcr plate as well as 11 µl of the reaction mixture for a total reaction volume of 20 µl. The qpcr was run for 40 cycles. 25

57 Samples 19 germline samples Library Preparation Ion Ampliseq Exome RDY Library Preparation Sequencing platform Life Technologies Ion Proton Quality check Torrent Suite software Sequence alignment Torrent Mapping Alignment program Variant calling Torrent Variant Caller plugin Genome Analysis Toolkit UnifiedGenotyper Merge using bcftools BCFtools intersect Figure 2.4: Whole exome sequencing pipeline flowchart 26

58 Exome sequencing Run plans were created for each chip with the barcode and sample identity number on the Torrent Browser server. The plans were created using the Torrent Suite Software with the run parameters listed in Table 2.1. Table 2.1: Parameters used to create whole exome sequencing run plans using Torrent Suite software Parameter Application Kit Library kit type Template kit Specified DNA Ion Ampliseq Exome Kit Ion Ampliseq Exome RDY IC Kit 1x8 Ion Chef, Ion PI IC 200 kit Flows 520 Chip type Barcode set Reference library Plug ins Ion PI chip IonXpress Human genome build 19 (hg19) variantcaller and coverageanalysis The sample libraries were diluted to approximately 50 pm, the optimal input concentration. The Ion PI v2 BC chips were prepared for loading by performing alternate washes with 100% isopropanol, Ion PI Chip Preparation Solution, nuclease-free water, 0.1 M NaOH, and 1X Ion Chip Priming Solution as per the manufacturer s instructions. The Ion PI IC Reagents 200 cartridge was removed from the freezer and warmed to room temperature 45 min before the Ion Chef Instrument run. The Ion Chef Instrument was loaded with treated Ion chips, consumables, reagents and libraries as per the manufacturer s instructions (Thermo Fisher Scientific). The Ion Chef Instrument run completed overnight. 27

59 The following day, the Ion Proton Sequencer was initialised as per the manufacturer s instructions (Thermo Fisher Scientific). The Ion chips were unloaded from the Ion Chef Instrument, and the first chip was loaded into the Ion Proton Sequencer. The second chip was stored in a container at 4 C until 20 min before the end of the first run. When the first run was completed, the second chip was loaded immediately for sequencing Sequence alignment and variant calling The Torrent Suite software (Life Technologies, v ) Torrent Variant Caller (TVC) was used to perform base calling. The resulting base calls were stored in an unmapped *.bam format. The Torrent Suite Torrent Mapping Alignment Program (TMAP) was used to align sequencing reads to the reference genome using human genome build 19 (hg19). Some or all of the reads produced by the WES pipeline are used as input for TMAP, along with the reference genome and index files. The output from TMAP is a mapped *.bam file Variation to sequence alignment and variant calling Torrent variant caller plugin As an additional measure, base calling was performed a second time using the TVC Plugin (Life Technologies, version 5.0.0). The TVC Plugin software was installed on Magnus (Pawsey Centre), a Cray XC40 supercomputer. The AmpliSeq Exome capture browser extensible data (*.bed) file from Life Technologies was used as the target region *.bed and primer trim *.bed file (available from The output is a variant call format (*.vcf) file containing meta-information lines, a header line and data lines for each position in the genome. 179 Each individual was called separately using TVC, generating 19 individual *.vcf files. The details used to run the TVC Plugin on Magnus are outlined in Table

60 Table 2.2: Parameters used to run the Torrent Variant Caller plugin to call bases Parameter Input bam Reference fasta Region bed Primer trim bed Error motifs Specified All *.bam files from the Ion Proton hg19.fasta AmpliSeqExome designed.bed AmpliSeqExome designed.bed ampliseqexome_germline_p1_hiq_motifset.txt Each of the 19 patients was called individually and then merged using Binary Variant Call Format Tools (BCFtools) vcf-merge 179 to create a single *.vcf file. As TVC only calls individual *.bam files, there is uncertainty whether a position is truly missing or is reference homozygous. BCFtools missing-to-reference 179 was also run on the merged file to fill unknown positions to homozygous reference (0/0) Genome analysis toolkit The Genome Analysis Toolkit (GATK, version 3.4.0) UnifiedGenotyper 180 was used in addition to the single sample calling to sort, index and call the *.bam files to ensure base calling accuracy. GATK can perform multi-sample calling. Therefore, all 19 patients were called together. GATK UnifiedGenotyper was used on a secure Linux server owned by GOHaD (operating system: Bio-Linux (based on Ubuntu )). UnifiedGenotyper uses a Bayesian genotype likelihood model to estimate the most likely genotypes and allele frequency in a population of samples simultaneously and produces a genotype for each site. First, each sample was sorted using SAMtools sort and indexed using SAMtools index. 181 Picard CreateSequenceDictionary (version 2.4.1, was used to create a sequence dictionary for a reference sequence and then Picard BedToIntervalList was used to convert a *.bed file to Picard interval list format. The specifications used to run GATK UnifiedGenotyper on the server are outlined in Table

61 Table 2.3: Parameters used for Genome Analysis Toolkit UnifiedGenotyper to call bases Parameter Reference fasta Genotype likelihoods model Input bam Target interval list Out mode Metrics Specified hg19.fasta SNP All sorted *.bam files from the Ion Proton AmpliSeqExome bed EMIT_ALL_CONFIDENT_SITES Directory for metrics Stand-conf-call 50.0 Stand-emit-conf 10.0 Annotation AlleleBalance Intersect variant calls from Torrent Variant Caller and Genome Analysis Toolkit The resulting *.vcf files from both TVC and GATK were combined using BCFtools intersect (isec) 181 exact allele match to identify the common calls between TVC and GATK. This tool created both intersections and complements of the TVC and GATK *.vcf files. The intersect data from both callers was used for the remainder of the analysis Recalibrate variants GATK VariantRecalibrator 180 was used to assign a well-calibrated probability to each variant call in a call set. This tool has a two stage process called Variant Quality Score Recalibration (VQSR). The first pass is performed by VariantRecalibrator 180 and consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set and then scoring all input variants according to the model. 180 The recalibrated variant quality score provides a continuous estimate of the probability that each variant is correct, allowing one to partition the call sets into quality tranches. 182 The 30

62 primary purpose of the tranches is to establish thresholds within the data that correspond to particular levels of sensitivity relative to the truth sets. The second pass is performed by the ApplyRecalibration tool 180 that applies the model parameters to each variant in input *.vcf files to produce a recalibrated VCF file in which each variant is annotated with its variant quality score log-odds (VQSLOD) value. 182 This step also filters the calls based on this new logarithm of the odds (LOD) score by adding Pass for variants that meet the specified threshold, and LowQual in the FILTER column for variants that do not meet the specified LOD threshold. 180 The filter level selected for the ApplyRecalibration tool was Genotype concordance Concordance was measured in three patients that had previously been genotyped to validate the genotype calls. The three patients, Patient 1-II-2, Patient 2-II-1 and Patient 3-III-1, all sarcoma cases, had been genotyped previously through the ISKS using an Agilent HaloPlex custom panel of gene coding sequence capture. Genotype calls were compared across the three sarcoma cases and to determine how many calls (either 0/0, 0/1 or 1/1) were the same between the intersect file and previous genotyping using the Agilent HaloPlex custom panel. Any discordant variants were checked in the *.vcf files. The *.bam files were 183, 184 also visually examined in Integrative Genomics Viewer (IGV, version ). 2.3 Results Families selected for whole exome sequencing This study included 19 patients from three multigenerational mixed cancer families. Of these, 11 (58%) were female, and nine (47%) had been diagnosed with cancer. The average age of the patients at the time of blood collection was 55.3 years (range: 15 years to 90 years) and the average age of cancer (including sarcoma) onset was 47.5 years (range: 15 years to 79 years). The average age of onset in the three families is younger than the average age of onset of all 31

63 cancers in the whole ISKS cohort (57.9 years) but similar to the age of onset of sarcomas (46.6 years) Whole exome sequencing Table 2.4 shows the summary statistics generated by the Torrent Suite software. The average depth of coverage across all samples was reads, which is a sufficient depth for detecting single nucleotide variants (SNVs). 185, 186 The average number of mapped reads was 38,484,361, and the average total genotyping rate was 98.9%. Table 2.4: Depth of coverage summary from Torrent Suite Patient Mapped reads On target Mean Depth Number of variants 1-I-1 43,848, % ,625 1-I-2 28,509, % ,690 1-II-1 28,343, % ,334 1-II-2 38,178, % ,113 1-II-3 39,158, % ,915 1-III-1 37,229, % ,670 1-III-2 42,568, % ,641 2-I-1 33,480, % ,574 2-I-2 48,585, % ,220 2-II-1 35,464, % ,678 2-II-2 45,333, % ,491 2-II-3 46,884, % ,238 2-II-4 36,173, % ,517 2-III-1 30,353, % ,282 3-I-1 34,870, % ,493 3-II-1 42,063, % ,329 3-II-2 40,663, % ,846 3-III-1 47,344, % ,337 3-III-2 32,146, % ,169 Average 38,484, % ,482 32

64 2.3.3 Variant calling 5,099,324 unknown positions were changed to reference positions in the merged TVC *.vcf files using BCFtools missing-to-reference. In total, 109,503 variants were called by TVC and 238,530 variants were called by GATK UnifiedGenotyper. Figure 2.5 shows a diagram of the number of calls by TVC and GATK and the intersection of both callers. The intersect file from both callers contained 94,263 variants for all 19 patients. 144,267 94,263 15,240 Genome Analysis Toolkit Intersect Torrent Variant Caller Figure 2.5: The number of variants called by Torrent Variant Caller and Genome Analysis Toolkit UnifiedGenotyper, and the number of variants that were called by both callers (intersect) Recalibrate variants Figure 2.6 shows the tranche plot generated by GATK VariantRecalibrator. The first tranche (90), has the lowest value of truth sensitivity but the highest value of novel Ti/Tv, is very specific but less sensitive. 187 Each subsequent tranche introduces additional true positive calls along with a growing number of false positive calls. 187 Table 2.5 shows the 99.0 tranche used in this study that has 85,941 known calls and 3,097 novel calls with 49,447 accessible truth sites. In total, 48,952 calls were made in tranche The resulting file now has a new column generated by VariantRecalibrator that has pass or low quality for each variant. 33

Figure 2.6: Genome Analysis Toolkit VariantRecalibrator tranche plot X-axis: the number of novel variants called. Y-axis: the novel transition to transversion ratio and the overall truth sensitivity.

65 Figure 2.6: Genome Analysis Toolkit VariantRecalibrator tranche plot X-axis: the number of novel variants called. Y-axis: the novel transition to transversion ratio and the overall truth sensitivity. TP (true positive): exact match of non-reference genotype. FP (false positive): additional alternate allele in WES genotype. Table 2.5: Genome Analysis Toolkit VariantRecalibrator tranche results Tranch minvqslod Known Novel Truth sites Called ,136 at ,254 at ,447 accessible 44, ,941 at ,097 at ,447 accessible 48, ,789 at ,200 at ,447 accessible 49, ,975 at ,528 at ,447 accessible 49,447 34

66 Figure 2.7 shows the 2D projection of mapping quality rank sum (MQRankSum) test versus haplotype score by marginalising over the other annotation dimensions in the model. The mapping quality rank sum test is the u-based z-approximation from the Mann-Whitney Rank Sum Test 188 for mapping qualities, that is, reads with reference bases versus those with the alternate allele. 187 This measure can be used to evaluate the likelihood of SNPs being real. Figure 2.7: Genome Analysis Toolkit VariantRecalibrator projection for mapping quality rank sum (MQRankSum) versus haplotype score The upper left panel shows the probability density function that was fitted to the data. Green: high quality. Red: lowest quality. The remaining three panels give scatter plots in which each single nucleotide polymorphism (SNP) is plotted in the two annotation dimensions (MQRankSum and HaplotypeScore) in a point cloud. In the upper right panel, SNPs are coloured black and red to show which SNPs are retained and filtered, respectively, by applying the variant quality score recalibration procedure. The lower left panel colours SNPs green, grey, and purple to give a sense of the distribution of the variants used to train the model. Green SNPs: found in the training sets. Purple: given the lowest probability of being true. The lower right panel colours each SNP by their known/novel status. Blue: known SNPs. Red: novel SNPs. 35

67 2.3.5 Genotype concordance A total of 212 positions across three previously genotyped individuals were used to compare genotype calls from WES and Agilent HaloPlex custom panel (Figure 2.8). Of those 212 positions, 77 were not called in the WES data due to low coverage or position of the primers. Of the remaining 135 positions, 123 calls (91%) were concordant between the two data types and 12 calls (9%) were discordant. Of the 12 discordant calls, two of the calls were in Patient 3-III-1 and were called at 1/1 using the Agilent HaloPlex custom panel data and called as 0/1 in the WES data. The remaining ten discordant calls were all in Patient 2-II-1 and were called as 0/0 from the Agilent HaloPlex custom panel data and either 0/1 (6 calls) or 1/1 (4 calls) using the WES data. Both concordant and discordant calls were kept in the intersect file. The genotyping positions were all located in easy to map regions of the genome and may not reflect the true false positive to false negative rate for all positions. 36

68 Called variant by Agilent HaloPlex custom panel genotyping Concordant Called variant by Ion Proton whole exome sequencing Not called in whole exome sequencing data 77 Concordant 123 Discordant 12 Called as variant by Agilent HaloPlex 2 Called as variant by Ion Proton 10 TOTAL 212 Figure 2.8: Concordance of genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing on Ion Proton for three patients Blue: Called homozygous alternate (1/1) by Agilent HaloPlex custom panel but called heterozygous (0/1) by Ion Proton whole exome sequencing. Green: called variant (0/1 or 1/1) by Ion Proton whole exome but called homozygous reference (0/0) by sequencing Agilent HaloPlex custom panel. 37

69 Table 2.6 shows the ten positions in Patient 2-II-1 in which the genotype calls are discordant between the Agilent HaloPlex custom panel and the WES genotype, that is, where the variant is called 0/1 in the intersect file but no variant is called by the Agilent HaloPlex custom panel. The genotype calls for Patient 2-II-1 were checked in the TVC *.vcf, the GATK *.vcf and the intersect file. The genotype calls for the ten positions were the same across the three files. The genotype results from WES for both parents of Patient 2-II-1 (Patient 2-I-1 and Patient 2-I-2) are included in the last two columns of Table 2.6. These results indicate the WES genotype calls for Patient 2-II-1 at these positions are likely correct, given the genotypes of both parents. Table 2.7 shows the two discordant variants for Patient 3-III-1 which were both called as homozygous alternate using the Agilent HaloPlex custom panel but called as heterozygous in the intersect file. The genotype calls for these two positions were checked in the TVC *.vcf file, the GATK *.vcf file and the intersect file. TVC called the first variant (chromosome 7) as 1/1 whereas GATK called the variant 0/1. Therefore the position is called as 0/1 in the intersect file. TVC called the second variant (chromosome 13) also as 1/1, GATK called the variant 1/1, however, in the intersect file the variant is called 0/1. For both variants, the parents of 3-III-1 (last two columns) have a homozygous alternate genotype call. On visual inspection of the *.bam files in IGV, Patient 3-III-1 appears to be also homozygous for the alternate allele at these positions. Therefore it appears the errors for these variant calls occurred when intersecting the *.vcf files. 38

70 Table 2.6: Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing for Patient 2-II-1 Chr Position Ref Alt Agilent HaloPlex GT Intersect file GT 2-I-1 (Father) 2-I-2 (Mother) C T 0/0 0/1 0/0 1/ T C 0/0 1/1 1/1 0/ A G 0/0 0/1 0/1 (low reads) 1/ A G 0/0 1/1 1/1 1/ A G 0/0 0/1 0/1 (low reads) 1/1 (low reads) T C 0/0 1/1 0/1 (low reads) 1/ G C 0/0 1/1 0/1 1/ A G 0/0 0/1 0/1 0/ G A 0/0 0/1 0/1 0/ C T 0/0 0/1 0/1 0/1 Chr: chromosome. Ref: reference allele. Alt: alternate allele. GT: genotype. Low reads: less than 10 reads at this position. Table 2.7: Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing for Patient 3-III-1 Chr Position Ref Alt Agilent HaloPlex GT Intersect file GT 3-II-1 (Father) 3-II-2 (Mother) T C 1/1 0/1 1/1 1/ G C 1/1 0/1 1/1 1/1 Chr: chromosome. Ref: reference allele. Alt: alternate allele. GT: genotype.

71 2.4 Discussion Evaluation of families used in this study It has long been recognised that cancer has a familial component. Genetic studies were traditionally performed on sets of related individuals, including Mendel s study of inheritance patterns in pea plants from parents to offspring that propose the underlying mechanisms of inheritance. 189 Pedigree studies have been used successfully to identify genes influencing a broad range of monogenic, highly penetrant traits. 161 There are several reasons why family studies are used for gene discovery. Firstly, pedigrees are more likely to represent a more homogeneous and limited set of causal genes which enhance the statistical power for gene discovery. 190 Secondly, clinical characteristics that are shared among family members also reduce heterogeneity for analysis. 190 Thirdly, the analysis of phenotypes among family members is controlled to some extent for both genetic background and environmental exposures. 190 Therefore, the background genetic variation is also controlled to some extent. Finally, family data allow a deeper level of genotyping quality control than is possible in studies of unrelated individuals. 190 There are also disadvantages of using families in genetic research. It can be more costly to recruit entire pedigrees compared to unrelated individuals. 190 However, the analysis of disease/trait segregation in pedigrees with known genetic markers has proven to be a robust approach to gene discovery. The study of familial cancer predisposition syndromes characterised by sarcoma probands has resulted in valuable insight into cancer biology and genetic risk. For example, the study of Li-Fraumeni syndrome defined the roles of the tumour suppressor gene, TP53, in the development of cancer. Since germline mutations in the TP53 gene were first identified in Li-Fraumeni syndrome families, the gene has also been implicated in the sporadic form of most cancers. 51 It is now known that the TP53 gene has a role in the regulation of the cell cycle, DNA repair, apoptosis, cellular metabolism, and senescence. 191 These findings have had a significant impact on the clinical management of familial cancer predisposition syndromes and cancers in general

72 The ascertainment of cancer cluster families by a sarcoma proband has also been used to study incidence and distributions of cancers in relatives of sarcoma probands in families not defined by known syndromes These studies found an increased cancer risk in relatives of sarcoma probands, and suggest the presence of shared underlying genetic risk variants independent of known cancer predisposition syndromes. 195, 196 The families selected for investigation in the current study were in this category, i.e. they were not defined by a known cancer predisposition syndrome and therefore represent an opportunity to identify novel risk variants associated with both sarcoma and cancer risk. The ISKS families selected for WES in the current study include sarcoma, prostate cancer and melanoma cases. The occurrence of these cancers in families has been previously reported in familial cancer syndromes such as Li-Fraumeni syndrome 51, 52, 198, 199 and familial atypical multiple mole melanoma (FAMMM) syndrome (characterised by mutations in the CDKN2A gene), as well as other non-fammm syndrome families, also found to have mutations in the CDKN2A gene. 202, 204 However, the three families selected do not have mutations in the CDKN2A gene and therefore represent an opportunity to identify novel genetic variants that may lead to the development of these cancers within a family. The number and size of pedigrees vary widely in genetic studies of familial cancer. The number of relatives can range from two family members to extended pedigrees with > 30 individuals. 205, 206 The families used in this study are similar in size to the families studied by Roach et al. (2010) to discover the causative gene for Miller syndrome and Shi et al. (2014) to identify rare POT1 variants in familial 207, 208 cutaneous malignant melanoma The use of whole exome sequencing to identify disease causing variants WES has been a powerful approach for identifying genes that underlie Mendelian disorders and complex traits. 141, 144, 209, 210 To date, most genes discovered that underlie rare Mendelian disorders have genetic variation in protein coding sequences 166, 211 that are predicted to have functional consequences and be deleterious. 41

73 WES has also been a powerful and efficient approach for the discovery of genetic mutations in various cancers, identifying more than 50 novel tumour-predisposing genes (Appendix B). The identification of clinically actionable driver mutations through WES has enabled the development of precision oncology therapies Many of the genes that have been implicated in hereditary sarcomas play a significant role in the cellular response to DNA damage that has led to the development 216, 217 of DNA repair targeted therapies. WES has the advantage of increased coverage of regions of interest (exons) at lower cost and higher throughput compared with current whole genome sequencing (WGS). 148 WES was therefore chosen for this study as an appropriate, affordable 139, 210 and robust in-house method Limitations of whole exome sequencing A weakness of WES is that it largely ignores variants residing in non-coding and intergenic regions that can affect gene expression. 218 Non-coding DNA plays an important role in gene regulation and 3D chromatin folding 219 However, the effects of non-coding variants on gene expression are not yet completely understood. 220 The effects of regulatory variation may be more subtle and may be more important in common complex diseases such as cancer compared to Mendelian diseases. 221 The relevance of regulatory variation to cancer susceptibility in humans is unclear, but it is possible that polymorphisms in non-coding regions 221, 222 might have an important role. As the costs of WGS decrease and analytical tools such as Encyclopedia of DNA Elements (ENCODE) 223 become more adept at interpreting the effects of non-coding variants, WGS will become more widespread. The use of WGS studies to investigate genetic variants in cancer cluster families may lead to the discovery of mutations in regulatory elements that add to the pool of disease-associated variants. 224 Structural variations (defined as DNA sequence alterations other than SNVs including insertions, deletions, duplications, inversions and translocations) were not examined using WES in this chapter. There are many challenges in somatic structural variation detection inherent in the limitations of NGS technologies, the complexities of tumour samples and the difficulties in structural variant 42

74 reconstruction. 227 As WGS technologies improve, the use of paired-end reads, deeper coverage and longer sequence reads will facilitate the examination of somatic structural variants in cancer The Ion Proton sequencing platform The Ion Proton generally shows similar performance to other high-throughput sequencing platforms. 228, 229 The Ion Proton is also known to produce high quality data at a comparable average depth and read length in addition to a faster 172, 229, 230 turnaround time compared to the Illumina HiSeq. The average percent of reads on target produced in this study was 95.08%. The measurement of reads on target is represented by the ratio of the number of reads within a target region to the total number of bases output by the sequencer, expressed as a percentage. Off-target regions refer to those areas that are located 5 and 3 to target regions (upstream, downstream, untranslated regions and intronic). The percentage of on-target reads are dependent on the platform used as each platform uses different target choices, bait lengths, bait density and molecules used for capture Base calling software The TVC software was developed specifically to call Ion Proton sequencing data. However, it cannot produce multi-sample variant call files. The advantage of using multi-sample calling is to distinguish non-variant genotypes between 149, 231 homozygous reference genotype and missing genotype in cohort analysis. Multi-sample variant calling reduces the probability of calling random sequencing errors and increases the likelihood of calling alleles of low frequency or low coverage in a single sample. 149 Therefore, the sensitivity and accuracy of base calling are improved. 149 When calling the samples individually using TVC, many positions had to be filled to reference homozygous, and it was impossible to distinguish missing from homozygous reference positions. GATK UnifiedGenotyper can perform multi-sample calling and can, therefore, distinguish between missing and reference homozygous positions. However, GATK is not suited to Ion Proton data as the Ion Proton platform produces markedly different data to the Illumina 43

75 platform. 232 There were over twice the number of variants returned from GATK UnifiedGenotyper (238,530) compared to the number returned by TVC (109,503). Anecdotally, GATK does produce a higher number of false positives which may account for the difference in variants called (up to 10 times as many as reported on online bioinformatics forums). An intersect file of the calls made by TVC and GATK UnifiedGenotyper was created to reduce the number of false positives in the final call set and to overcome the problem of single sample calling by TVC and the platform differences by using GATK. Previous studies have recommended using multiple callers to generate a final call set. 233, 234 A simple way to combine call sets is to take the intersection or union of calls as final calls. 234 However, this was a very rigorous approach that reduced the number of variants from 109,503 called by TVC and 238,530 from GATK UnifiedGenotyper to just 94,263 in the intersect file. Therefore, some true variants may have been excluded as a result of using the intersect file. However, this may be the best approach for reducing the number of false positive calls Concordance In this study, the concordance rate of genotype calls for 135 positions from WES and the Agilent HaloPlex custom panel was 91%. The concordance rate falls into the range supported by previous literature on the concordance rates of panel versus sequencing data. A previous study by Motoike et al. (2014) aimed to validate SNV calls by exome analysis. They sequenced 12 independent genomes from Japanese patients using the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109). 235 Reads were aligned to hg19 using TMAP and genotype calling was performed on each sample using TVC. 235 Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni (version 2.5-8) SNP chip data were used as the reference. They analysed a total of 79,143 SNPs on the autosomes and found the concordance rate between the Omni and Ion Proton calls to be %. 235 These figures are comparable to results reported in a previous study

76 The intersect file described in this chapter was used in Aim 2 of this study to identify candidate risk variants. None of the discordant calls were removed from the intersect file. However, due to the findings of the concordance analysis, particularly the wrong call found in the intersect file but not either of the original *.vcf files from TVC or GATK, each variant detected in the analysis of this data was visually verified in the *.bam file using IGV. 45

77 46

78 Chapter 3 Aim 2: Identification of candidate germline risk variants in three cancer cluster families 3.1 Introduction Whole exome sequencing (WES) generates data on a large number of variants, most of which are not relevant to the disease of interest as they do not have a functional effect at the protein or systemic level. 236 The second aim of this study was to use the WES data described in Chapter 2 to identify candidate germline risk variants that segregate with cancer or sarcoma in three cancer cluster families. The analysis of WES data requires comprehensive computational approaches and strategies to identify candidate risk variants or genes for a disease of interest Despite advances in sequencing platform technology, reference data sets, software, and analysis pipelines, there is no gold standard for the filtering and prioritisation of variants. However, many guidelines, tools, and online resources have been developed to assist in the identification of functional variants from WES. 47

79 3.2 Bioinformatic strategies for variant filtering and prioritisation in whole exome sequencing Annotation As the sequencing of cancer genomes can reveal thousands of mutations, an essential step in the interpretation of WES data is the annotation of variants and their potential effects on genes and transcripts. 240 Variant annotation is the process of assigning functional information to DNA variants. At a basic level, annotations can be used to identify genes, transcripts and genomic regions, and at a higher level, also predict the impact of the variant on the protein product. There are over 80 bioinformatic tools available for genomic annotation, many of which are available as web-based applications. 241 Most tools focus on the annotation of single nucleotide variants (SNVs) as they are easily identified and analysed. 242 However, an increasing number of tools are being developed to annotate copy number alterations and other structural variations including 241, insertions, deletions, inversions and translocations. The most common form of annotation is the provision of links to public databases such as the National Center for Biotechnology Information (NCBI) Short Genetic Variations Database (dbsnp) or the 1000 Genomes Project. 251, 252 The functional prediction of variants can result from a simple sequence-based analysis, region-based analysis, or evaluation of the structural impact on proteins. 242 The choice of annotation tool is largely dependent on the desired selection of variant annotations. A widely used annotation tool to identify the functional consequence of sequence variation is Annotate Variation (ANNOVAR). 245 ANNOVAR predicts the functional effects of variants on genes, as well as performing genomic region-based annotation and comparison of variants to existing databases. 245 ANNOVAR incorporates scores based on evolutionary conservation and in silico prediction of functional consequences. 48

80 Annotation of non-coding regions A significant portion of the reads obtained in WES come from outside of the designed target region. 253 In a typical WES study, approximately 40-60% of the reads are off target, and all or most of these off-target reads are usually ignored Three main types of off-target reads are found in WES data: reads from introns and intergenic regions, reads from the mitochondrial genome and reads from viral genomes. 218 Although WES is not designed to identify regulatory variants in intronic and intergenic regions, off-target reads should not be discarded as many changes outside the coding regions may be responsible for disease phenotypes. 253 Annotation also plays an essential role in the interpretation of off-target variants. Regulome Database (RegulomeDB) can be used to guide the interpretation of regulatory variants in the human genome to identify potential regulatory changes based on experimental data sets from the Encyclopaedia of DNA Elements (ENCODE) and other sources. 257 RegulomeDB also includes computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. 257 RegulomeDB uses a heuristic scoring system based on the functional consequence of the variant Variant class filtering Variant filtering can be carried out using annotations for the genomic location and the variant class. Annotations from ANNOVAR can be used to identify intronic variants, exonic variants, intergenic variants, 5 and 3 -untranslated region (UTR) variants, splicing site variants, and upstream or downstream variants. 245 For exonic variants, ANNOVAR scans annotated messenger ribonucleic acid (mrna) sequences to identify and report amino acid changes, as well as stop-gain or stop-loss mutations. 245 Exonic missense, nonsense, stop-loss, frameshift and splice site variants all have potential to affect protein function and are retained during this filtering process. 211, 239 RegulomeDB scores can also be used to filter variants that are more likely to lie in a functional location. 49

81 3.2.3 Population frequency filtering Population frequency is one of the primary criteria for predicting if a variant is likely to have a functional effect on the encoded protein. 258 Some rare nonsense variants might be expected to have a larger functional impact than a frequently occurring one. 211, 259 The Exome Aggregation Consortium (ExAC) database is the biggest catalogue of protein-coding genetic variation to date and is intended to be used as a general population resource to filter variants, including, for example, minor allele frequency (MAF). 260, 261 The ExAC database is the aggregation and analysis of high-quality exome DNA sequence data for 60,706 individuals of diverse ancestries. 261 The ExAC database is recommended due to the allele frequencies being calculated from considerably more samples compared to the Exome Variant Server and the 1000 Genomes Project. 252, 260 In disease studies, a commonly used starting point for filtering is to remove variants with a MAF > 1% Evolutionary conservation Genomic Evolutionary Rate Profiling (GERP) uses a comparative genomics approach to identify putatively functional sequences by comparing similarity across divergent species to identify sequences that have been maintained during evolution. 262 Pathogenic mutations tend to have a markedly higher conservation than benign variants. 263, 264 GERP uses maximum likelihood evolutionary rate estimation for position-specific scoring. 262 GERP scores range from a maximum of 6.18 to a below-zero minimum (-12.36). Positive scores represent a substitution deficit (expected for sites under selective constraint), while negative scores represent a substitution surplus Functional impact prediction In silico analysis of functional consequences of a variant on protein function and estimates of evolutionary conservation are often used for prioritisation in genetic discovery studies. Non-synonymous variants that lead to an amino acid change in the protein product are of particular interest as amino acid substitutions 50

82 account for approximately half of the known genetic variants responsible for human inherited disease. 265 Sorting Intolerant From Tolerant (SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2) are commonly used tools that can predict if an amino acid substitution will have an effect on the protein function. 266, 267 SIFT uses sequence homology to predict whether an amino acid substitution will affect protein function and potentially alter phenotype. 266 A SIFT score 0.05 is predicted to be damaging, and a score > 0.05 is predicted to be tolerated. PolyPhen-2 predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. 267 A PolyPhen-2 score between 0.0 and 0.15 is predicted to be benign, a score between 0.15 and 1.0 is predicted to be possibly damaging, and a score between 0.85 and 1.0 is more confidently predicted to be damaging. 267 An alternative strategy for filtering variants is based on a priori knowledge of the functional involvement of variants or genes. For example, association studies with candidate genes have been used to identify a number of risk genes for complex diseases. 268 A candidate gene study takes advantage of and is limited by knowledge of the phenotype, tissues, genes and proteins that are likely to be involved or have been previously implicated in the disease. 268, 269 Assessing candidate genes possessing functional variants in the context of existing biomedical knowledge and known biomolecular functions can be used to produce a manageable set of variants for further validation or exploration. 239 Several next generation sequencing (NGS) studies have identified rare variants associated with disease using a candidate gene approach In addition to variant filtering based on annotation and functional impact predictions, strong genetic support is also necessary for assigning possible causality to variants identified using WES. 239 Evidence of genetic association or familial segregation should be supplemented by functional and bioinformatics support. 51

83 3.2.6 Association analysis in families Association analysis in families can identify genes that influence complex human traits and provide protection against population stratification. 275 Variance components models are a way to assess the amount of variation in a dependent variable that is associated with one or more random-effects variables. 276 Variance components analysis is widely used in the genetic analysis of quantitative traits in family studies. 275 This approach is favoured because it can accommodate pedigrees of any size, it allows both linkage and association analysis, and tends to be more robust than competing approaches. 275 Sequential Oligogenic Linkage Analysis Routines (SOLAR) is a software that performs variance components analysis in pedigrees. 277 Almasy and Blangero (1998) 278 extended the strategy developed by Amos (1994) 279 for pedigree-based variance components analysis to estimate the genetic variance attributable to the region around a specific genetic marker using SOLAR. Maximum likelihood methods that take into account relationships among family members can be used to determine association in a polygenic model in SOLAR Familial segregation Segregation analysis is a general method for evaluating the transmission of a disease or trait within pedigrees. Segregation analysis can be used to prioritise and filter variants by assessing the co-segregation of candidate variants with disease status. 276 This analysis distinguishes variants that segregate with the disease of interest and are absent in unaffected family members. Segregation analysis can be applied to any pedigree structure and works with both qualitative and quantitative traits Outline of chapter This chapter describes the annotation, filtering, prioritisation and segregation analysis of WES data to identify putative germline risk variants that are associated with cancer or sarcoma in three cancer cluster families. WES data from Chapter 2 was annotated using ANNOVAR and RegulomeDB. Putative structural and regulatory variants were filtered using genomic location and variant class or 52

84 RegulomeDB score. Three different strategies were used to further prioritise rare private variants, known rare variants and candidate gene variants. Prioritised variants were tested for association with sarcoma and cancer using SOLAR. Significant variants were assessed for familial segregation with disease. 3.3 Methods Ascertainment bias correction The families selected for this study were ascertained from the International Sarcoma Kindred Study (ISKS), 175 as described previously in Chapter 2. A weighted covariate using a probability unit (probit) regression was created in R 281 (bias reduction in binomial-response generalised linear models (brglm) library, version 3.1.2) 281 to account for ascertainment bias in the sample. Probit regression assigns a weight to each based on their case status and can be used as a covariate in modelling Intersection The intersect file created from the variant call files from the Torrent Variant Caller (TVC, version 5.0.0), and Genome Analysis Toolkit (GATK, version 3.4.0) UnifiedGenotyper in Chapter 2 was used in these analyses. This file consists of 94,623 variants Annotation and filtration ANNOVAR (version 2015Jun16) 245 was used to annotate the intersect file using gene-based annotation. Using the ANNOVAR annotation, variants were filtered to include only putative structural variants. Variant filtering retained loci if they: (1) were exonic, (2) were predicted to be nonsynonymous or resulting in a stop gain or stop loss, (3) were predicted to be deleterious or probably damaging in SIFT and PolyPhen-2 and, (4) had a GERP score < 3. 53

85 All remaining variants that were not classified as putative structural variants were annotated using RegulomeDB. 257 Putative regulatory variants that had a RegulomeDB score of 1a, 1b, 1c, 1d, 1e, 1f, 2a, 2b or 2c were retained as these scores represent the highest confidence that a variant lies within a functional location. Table 3.1 shows the classification of scores from RegulomeDB. Known expression quantitative trait loci (eqtl) for genes are associated with expression and are most likely to result in a functional consequence. 257 Other subcategories with high confidence for regulatory variants are transcription factor (TF) binding, TF motifs, Deoxyribonuclease (DNase) footprints and DNase peaks. 257 Table 3.1: Classification of Regulome database scores Score 1a Supporting data eqtl + TF binding + matched TF motif + matched DNase Footprint + DNase peak 1b eqtl + TF binding + any motif + DNase Footprint + DNase peak 1c 1d 1e 1f 2a 2b 2c 3a 3b eqtl + TF binding + matched TF motif + DNase peak eqtl + TF binding + any motif + DNase peak eqtl + TF binding + matched TF motif eqtl + TF binding / DNase peak TF binding + matched TF motif + matched DNase Footprint + DNase peak TF binding + any motif + DNase Footprint + DNase peak TF binding + matched TF motif + DNase peak TF binding + any motif + DNase peak TF binding + matched TF motif 4 TF binding + DNase peak 5 TF binding or DNase peak 6 Other eqtl: Expression Quantitative Trait Loci. TF: Transcription Factor. DNase: Deoxyribonuclease. 54

86 False positive variants that arise due to misalignment, inaccuracies and biases in the reference sequence can be identified and provisionally excluded during a search for disease-causing variants. Fuentes Fajardo et al. (2012) analysed WES data from 118 individuals in 29 families to create a list of 2,157 genes that are candidates for provisional exclusion from exome analysis. 282 All filtered variants in this study were cross-referenced to the exclusion list by Fuentes Fajardo et al. (Available in the paper s Supplementary material: Table S7 gene exclusion list final ) to determine if any results found in polygenic regions should be excluded to reduce the risk of false positives Prioritisation strategies Prioritisation using a rare private variants strategy The first prioritisation strategy was applied to the filtered variants from the intersect file to identify rare private variants. Rare private variants are defined as those unique to individuals or families, and those that have not been previously annotated. 283 A major driving hypothesis behind WES of complex diseases is that multiple, rare variants in protein-coding genes contribute to the disease/trait of interest. 284 The focus on rare genetic variation is supported by studies that predict that numerous functional and deleterious variants segregate in the population at frequencies too low (0.5-5%) to detect by genome wide association (GWA) studies. 128 Investigators have successfully used this approach to identify rare private variants after removing known variants with a reference SNP identification (rs ID) from further consideration if they are found in the International Haplotype Project (HapMap), 285 the 1000 Genomes Project, 286 or dbsnp. 251 The variants from the intersect file were filtered to remove those that had been previously annotated to prioritise rare private variants in this study Prioritisation using a known rare variants strategy The second strategy was used to prioritise known rare variants from the filtered intersect file using a population database and MAF information. By filtering the data from WES for rare variants that have been documented in a large database such as ExAC, variants that occur at a low frequency in the population 55

87 that may be associated with cancer are more likely to be prioritised in these cancer cluster families. The full list of variants from the ExAC browser were downloaded (version 0.3.1, 30 August 2016). Variants from a complete list of ExAC browser variants with a MAF 0.01 (1%) and that were also in the intersect file were selected Prioritisation using a candidate gene strategy The prioritisation of candidate genes based on a priori knowledge of cancer biology was the third prioritisation strategy used on the filtered intersect file in this study. The variants from the intersect file were filtered to prioritise those detected in 119 known cancer and sarcoma genes including 25 kb upstream and downstream of the gene to include any potential regulatory variants captured in off-target reads. Candidate genes were selected from two cancer gene panels and a search of the Online Mendelian Inheritance in Man (OMIM) database. 287 Cancer genes were chosen from the HaloPlex Cancer Research Panel, 288 and Illumina s MiSeq and TruSeq Cancer Panels. 289 Both panels are NGS target enrichment panels that were designed for known cancer hotspots. The panels contain genes found in previous research to be associated with a broad range of cancer types as well as with published drug targets. Candidate genes from the results of a search of the OMIM database for genes known to be associated with the specific sarcoma subtypes in the three families were also included. 287 The full list of cancer genes used in the prioritisation process can be found in Appendix G. The variants present in both the intersect file and in the candidate genes were selected Methods for testing association of variants with cancer phenotypes SOLAR (version 7.6.4) 277 was employed to estimate and test the significance of association under a polygenic model for quantitative phenotypes (age at onset of cancer and age at onset of sarcoma) and disease status (cancer and sarcoma). Covariates included were age and sex of the participant, and the age sex interactions along with a weighting factor assigned to each individual to correct for the ascertainment 56

88 bias. Analysis of disease status as discrete binary traits was performed using a liability threshold model in SOLAR. This model employs probit regression for the mean effect component and a standard random effects variance component model for the residual additive genetic component of variance. 278, 290 As variance component models are highly influenced by kurtosis (a descriptor of the shape of a probability curve), the quantitative phenotypes were inverse normalised using the SOLAR function, inorm Bonferroni correction Bonferroni correction was performed on each annotated variant list to correct for multiple testing. 292 Corrections were performed for each method based on the number of variants in the prioritised list. Any significant variants after correcting for multiple testing, or nominal variants (p-value < 0.05), were investigated for co-segregation in the families Familial segregation analysis Three assumptions were used to determine familial segregation. First, the variant will be rare (shared only by cases in one family). Second, every carrier of a putative disease-causing variant will have the phenotype (complete penetrance). Third, every individual with the disorder will carry the putative disease-causing variant (100% probability of observing a genotype given the phenotype). 284 Due to the segregation analysis assumptions, it was hypothesised that variants identified by this approach would be private mutations that co-segregate with cancer or sarcoma in each family. The genotypes of any variants found to segregate with the phenotype of interest were visually confirmed by importing the Binary Alignment/Map (*.bam) files into Integrative Genomics Viewer (IGV, version 183, ) by determining the number of reads for each allele Evidence further supporting candidate risk genes The candidate germline risk variants and the genes in which they arise were further examined for association with cancer pathogenesis using several in silico resources including the Catalogue of Somatic Mutations in Cancer (COSMIC),

89 the pathway unification database (PathCards), 293 gene ontology (GO) annotations, 294 PubMeth (a database of methylation in cancer), 295 and NCBI. 296 A PubMed search was performed using a string ( gene name ) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April Abstracts were screened for relevance to the current study. 3.4 Results Variant prioritisation The intersect file containing 94,263 variants was annotated with ANNOVAR and RegulomeDB and variants in known polymorphic regions were removed. Approximately 42% of variants were exonic and 51% were intronic (Table 3.2). Less than 1% of variants were intergenic. Of the exonic variants, approximately 48% were nonsynonymous, and 51% were synonymous, with 0.5% classified as stop gain and loss variants. Table 3.2: Functional annotation of intersect file using ANNOVAR Function Percentage Exonic Nonsynonymous Synonymous Stop gain/loss 0.50 Unknown 1.35 Intronic Intergenic 0.04 Upstream/downstream 0.68 UTR 4.96 Other

90 Prioritisation using a rare private variants strategy The first prioritisation method was employed to identify rare, novel variants not previously reported in reference data sets. Of the 94,263 variants in the intersect file, 4,425 variants had not previously been annotated with an rs ID number. Of these, 1,858 (42%) were exonic variants and 1,184 (64%) were nonsynonymous Prioritisation using a known rare variants strategy The second prioritisation method was used to identify known rare variants using the ExAC public database. There were over 10 million variants in the ExAC browser (release 0.3.1, 30 March 2016). Of those 10 million variants, 3,686,062 variants had a MAF of less than Of the ~3.7 million rare variants, 8,840 variants were also in the intersect file. Of these, 5,184 (59%) were exonic and 2,815 (54%) were nonsynonymous Prioritisation using a candidate gene strategy The third prioritisation method was based on a priori knowledge of cancer and sarcoma. The results of the WES intersect file were filtered to only those variants in known cancer and sarcoma genes (1,297 variants). Of these variants, 806 were in the known cancer genes listed in Appendix G. The remaining 491 variants were located in regions upstream and downstream (25 kb) of each known cancer gene. Appendix H contains a table of variants in the upstream and downstream regions of the cancer genes that were also prioritised using this method. Of the 1,297 variants, 487 (38%) were exonic and 211 (43%) were nonsynonymous Summary of annotated variants from each prioritisation strategy A summary of the annotated variants from each prioritisation strategy is presented in Table 3.3. The first section of the table shows the number of variants prioritised by each strategy, followed by the genomic location of variants, exonic function and functional prediction. The final section of the table shows the number of variants classified as putative structural and functional variants. The results of each prioritisation strategy were tested for significant associations with cancer phenotypes using SOLAR. 59

91 60 Table 3.3: Summary of variant annotation using Annotate Variation and Regulome Database for each prioritisation strategy Strategy Rare private variants Known rare variants Candidate gene variants Number of variants prioritised 4,425 8,840 1,297 Location Exonic 1,858 5, Intronic 2,170 3, Downstream Upstream untranslated region untranslated region Splicing Non-coding RNA Intergenic Upstream/downstream /3 untranslated region Exonic function Nonsynonymous 1,184 2, Stop gain Stop loss Synonymous 601 2, Unknown

92 Strategy Rare private variants Known rare variants Candidate gene variants Functional prediction Deleterious in SIFT and PolyPhen Tolerated in SIFT and PolyPhen , Unknown in SIFT and PolyPhen-2 3, Regulome database score < Classification Putative structural variants Putative regulatory variants SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism Phenotyping-2. 61

93 3.4.2 Rare private variants Association analysis in SOLAR The annotated rare private variants (Table 3.3) were tested for association with cancer phenotypes using SOLAR. The results from SOLAR were corrected for multiple testing using the Bonferroni method with the number of prioritised variants (4,425). The significance level after correction was α < 1.23 x No variants were significantly associated with a cancer phenotype after correcting for multiple testing. As the variants prioritised by this strategy were rare, novel variants, all nominally significant variants (p-value < 0.05) were visually confirmed using IGV to determine if they could be due to alignment or calling error. Any variants located near an insertion or deletion or on the edge of a gap or read block were removed. Table 3.4 contains a summary of the nominally significant variants (p-value < 0.05) for the age at onset of cancer, the age at onset of sarcoma, and cancer status. The results show eight variants nominally associated with age at onset of cancer, six variants nominally associated with age at onset of sarcoma, and two variants showing nominal association with cancer status. There were no variants with a p-value < 0.05 for sarcoma status. Of the total variants, eight were associated with a single cancer phenotype, and four variants were associated with more than one cancer phenotype. Two variants were associated with age at onset of cancer and cancer status, and two variants were associated with age at onset of cancer and age at onset of sarcoma. As these were rare variants without an rs ID, MAF data from the 1000 Genomes Project database and annotation using RegulomeDB was not available. Therefore, all the variants identified by this prioritisation strategy were rare risk alleles. 62

94 Table 3.4: Summary of SOLAR association results for rare private variants Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 GERP Ref Alt MAF Age at onset of cancer 8: ARHGAP NS D D 5.37 C T : NPPC NS D D 5.29 C G : NELFCD NS D D 5.84 T C : DNAH NS D D 4.05 G A : LRRC16A NS D D 5.94 G A : FGD NS D D 4.29 C A : LTBP Unknown D D 3.77 C A : RIMS NS D D 5.65 G A 0.05 Age at onset of sarcoma 17: FADS6 < NS D D 5.15 A G : DNAH NS D D 4.05 G A : LRRC16A NS D D 5.94 G A : LRRC4B NS D D 3.45 C G : MON1B NS D D 3.61 G T : DHX NS D D 4.89 A G 0.18 Cancer status 8: ARHGAP NS D D 5.37 C T : LTBP Unknown D D 3.77 C A Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt: alternate allele. MAF: Minor Allele Frequency in the study population. NS: nonsynonymous. D: deleterious.

95 Segregation analysis results Of the 12 variants identified, seven were seen only in one family (ARHGAP39, NELFCD, LTBP4, RIMS1, DNAH9, LRRC16A and FADS6 ). However, using all three criteria for familial segregation, only one conserved deleterious variant in the ARHGAP39 gene showed nominal association with age at onset of cancer and cancer status and complete familial segregation in family 3 (Figure 3.1). Each family member with cancer was heterozygous at this position, whereas unaffected family members were homozygous for the reference allele at this position. None of the other prioritised rare private variants showed complete familial segregation in any of the families according to the familial segregation criteria. 3-I-1 Sarcoma Key Affected male 3-II-1 Prostate 3-II-2 Affected female Unaffected male Unaffected female Proband 3-III-1 Sarcoma 3-III-2 Patient Genotype at position in ARHGAP39 gene Read depth Ref, alt Patient 3-I-1 C/T 18,16 Patient 3-II-1 C/T 38,30 Patient 3-II-2 C/C 66,0 Patient 3-III-1 C/T 37,57 Patient 3-III-2 C/C 13,0 Figure 3.1: Genotypes for the ARHGAP39 variant that shows segregation in patients with cancer in family 3 64

96 3.4.3 Known rare variants Association analysis in SOLAR The annotated known rare variants (Table 3.3) were tested for association with cancer phenotypes using SOLAR. The results from SOLAR were corrected for multiple testing using the Bonferroni method with the number of variants prioritised (8,840). The significance level after correction was α < 5.66 x No variants were significant after correcting for multiple testing. Table 3.5 contains a summary of the nominally associated variants (p-value < 0.05) for the age at onset of cancer, the age at onset of sarcoma, and cancer status. The results include ten variants that showed nominal association with age at onset of cancer (eight putative structural and two putative regulatory variants), one putative regulatory variant that showed nominal association with age at onset of sarcoma, and 15 variants showing nominal association with cancer status (12 putative structural and three putative regulatory variants). There were no variants showing association with a p-value < 0.05 for sarcoma status. Of all the variants, 12 were associated with a single cancer phenotype, and seven variants were associated with more than one cancer phenotype. Of the latter, all seven variants were associated with both cancer status and age at onset of cancer. 65

97 66 Table 3.5: Summary of SOLAR association results for known rare variants Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF Age at onset of cancer 1: ZFP69B NS D D 3a 3.33 C G : BEAN NS D C A : UVSSA NS D D G A : C16orf NS D D 2b 5.22 T C : C6orf NS D D A T : ADAMTS NS D D C T : DLG NS D D C G : MYC < NS T D 2b 3.91 A G : KIF2C S.. 2b. G A : ABCB S.. 2c. G A Age at onset of sarcoma 16: SF3B3 < Int.. 2b. T G Cancer status 16: C16orf NS D D 2b 5.22 T C : ZFP69B NS D D 3a 3.33 C G : UVSSA NS D D G A : ADAMTS NS D D C T : DLG NS D D C G

98 Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF 13: STARD NS D D C T : HR NS D D G A : RRS NS D D C A : SYNPR NS D D C T : MOCS NS D D G A : SPATA NS D D A T : C6orf NS D D A T : ABCB S.. 2c. G A : IRAK NS T b C G : KRTAP NS T b T C Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000 Genomes Project. MAF: Minor Allele Frequency in study population. NS: nonsynonymous. S: synonymous. Int: intronic. UTR3: 3 untranslated region. UTR5: 5 untranslated region. D: deleterious..: not annotated in database. 67

99 Segregation analysis results Of the 19 variants, 13 were only seen in one family (ZFP69B, BEAN1, UVSSA, C16orf96, ADAMTS14, DLG5, KIF2C, ABCB5, STARD13, HR, RRS1, SYNPR and MOCS2 ). Using the three criteria for familial segregation, six variants showed complete familial segregation. Two conserved, deleterious variants showed complete familial segregation in family 2 (Figure 3.2). An exonic nonsynonymous variant in the C16orf96 gene showed nominal association with age at onset of cancer and onset of cancer. A synonymous variant in the ABCB5 gene also showed nominal association with age at onset of cancer and cancer status. Each family member with cancer was heterozygous at these positions, whereas unaffected family members were homozygous for the reference allele at these positions. 2-I-1 2-I-2 Prostate Key Affected male Affected female Unaffected male 2-II-1 Sarcoma 2-II-2 Melanoma 2-II-3 Melanoma 2-II-4 Unaffected female 2-III-1 Patient Genotype at position in C16orf96 gene Read depth Ref, alt Proband Genotype at position in ABCB5 gene Patient 2-I-1 T/T 119,0 G/G 69,0 Patient 2-I-2 T/C 28,38 G/A 6,4 Read depth Ref, alt Patient 2-II-1 T/C 63,38 G/A 32,41 Patient 2-II-2 T/C 36,27 G/A 54,65 Patient 2-II-3 T/C G/A 41,38 Patient 2-II-4 T/T 62,1 G/G 65,0 Patient 2-III-1 T/T 42,0 G/G 96,0 Figure 3.2: Genotypes for the C16orf96 and ABCB5 variants that show segregation in patients with cancer in family 2 68

100 Using the three criteria for familial segregation, four conserved, deleterious variants showed complete familial segregation in family 3 (Figure 3.3). Exonic variants in the ZFP69B and UVSSA gene showed nominal association with both age at onset of cancer and cancer status in family 3. Two exonic variants in the BEAN1 and KIF2C genes showed nominal association with age at onset of cancer in family 3. All patients with cancer in family 3 were heterozygous at these positions, and unaffected family members were homozygous for the reference allele at these positions. None of the other prioritised known rare variants showed complete familial segregation in any of the families according to the familial segregation criteria. 69

101 70 3-II-1 Prostate 3-I-1 Sarcoma 3-II-2 Key Affected male Affected female Unaffected male Unaffected female 3-III-1 Sarcoma 3-III-2 Proband Patient Genotype at position in ZFP69B gene Read depth Ref, alt Genotype at position in UVSSA gene Read depth Ref, alt Genotype at position in BEAN1 gene Read depth Ref, alt Genotype at position in KIF2C gene Patient 3-I-1 C/G 27,19 G/A 58,31 C/A 28,26 G/A 59,52 Patient 3-II-1 C/G 29,37 G/A 75,69 C/A 18,19 G/A 50,66 Patient 3-II-2 C/C 118,0 G/G 129,0 C/C 62,0 G/G 126,0 Read depth Ref, alt Patient 3-III-1 C/G 24,41 G/A 127,64 C/A 44,30 G/A 125,107 Patient 3-III-2 C/C 44,0 G/G 85,0 C/C 52,0 G/G 118,0 Figure 3.3: Genotypes for the ZFP69B, BEAN1, UVSSA and KIF2C variants that show segregation in patients with cancer in family 3

102 3.4.4 Candidate gene variants Association analysis in SOLAR The annotated candidate gene variants (Table 3.3) were tested for association with cancer phenotypes using SOLAR. The results from SOLAR were corrected for multiple testing using the Bonferroni method with the number of variants prioritised (1,297). The significance level after correction was α < 3.86 x No variants were significant after correcting for multiple testing. Table 3.6 contains a summary of the nominally associated variants (p-value < 0.05) for the age at onset of cancer, the age at onset of sarcoma, and cancer status. The results include 14 variants that showed nominal association with age at onset of cancer (2 putative structural and 12 putative regulatory variants), two putative regulatory variants that showed nominal association with age at onset of sarcoma, and 12 variants that showed nominal association with cancer status (one putative structural and 11 putative regulatory variants). There were no variants showing an association with a p-value < 0.05 for sarcoma status. Of the total variants, 18 variants were associated with a single cancer phenotype, and five variants were associated with more than one cancer phenotype. Three variants were associated with age at onset of cancer and cancer status, one variant was associated with age at onset of cancer and age at onset of sarcoma, and one variant was associated with age at onset of sarcoma and cancer status. 71

103 72 Table 3.6: Summary of SOLAR association results for candidate gene variants Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF Age at onset of cancer 16: PDIA NS D D C G : ATM NS D D C G : MEN Int.. 2b G C : MAP4K Int.. 1f 1.94 A G : ERCC2 < Int.. 2b 2.51 C T : ETV Int.. 2b 2.49 G A : TOP3A Int.. 1f G A : SMCR S.. 2b 5.20 G T : SHMT UTR3.. 1f G A : SHMT UTR3.. 1f 1.74 G C : SHMT Int.. 1f T C : MYBPC S.. 1f G A : MAP4K Int.. 1f 1.69 C T : LYPD S.. 1f A G Age at onset of sarcoma 8: RECQL S.. 2b 0.96 T C : MYBPC S.. 1f G A

104 Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF Cancer status 16: PDIA NS D D C G : MEN Int.. 2b G C : RGS Int.. 2b 3.34 C T : ETV Int.. 2b 2.49 G A : HNF4A Int.. 2b 2.12 G A : GPT Int.. 2b G A : RECQL Int.. 2b G A : RECQL S.. 2b G A : MYBPC S.. 2b 0.53 G A : MRPL Int.. 2b G C : CDH Int.. 2b 3.50 T C : RECQL S.. 2b 0.96 T C Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000 Genomes Project. MAF: Minor Allele Frequency in the study population. NS: nonsynonymous. S: synonymous. Int: intronic. UTR3: 3 untranslated region. UTR5: 5 untranslated region. D: deleterious..: not annotated in database. 73

105 Segregation analysis results Of the 23 different candidate variants identified, five variants were only seen in one family (PDIA2, ERCC2, HNF4A, MYBPC3 and MRPL28 ). However, using all three criteria for familial segregation, only one variant in the PDIA2 gene showed nominal association with age at onset of cancer and cancer status and complete familial segregation in family 2 (Figure 3.4). Each family member with cancer was heterozygous at this position, whereas unaffected family members were homozygous for the reference allele at this position. None of the other prioritised candidate gene variants showed complete familial segregation in any of the families according to the familial segregation criteria. 2-I-1 2-I-2 Prostate Key Affected male Affected female 2-III-1 2-II-1 2-II-2 2-II-3 Sarcoma Melanoma Melanoma 2-II-4 Unaffected male Unaffected female Proband Patient Genotype at position in PDIA2 gene Read depth Ref, alt Patient 2-I-1 C/C 206,1 Patient 2-I-2 C/G 109,135 Patient 2-II-1 C/G 74,67 Patient 2-II-2 C/G 63,43 Patient 2-II-3 C/G 106,108 Patient 2-II-4 C/C 138,0 Patient 2-III-1 C/C 161,0 Figure 3.4: Genotypes for the PDIA2 variant that shows segregation in patients with cancer in family 2 74

106 3.4.5 Evidence further supporting germline risk genes The nominally significant (p-value < 0.05) variants that showed familial segregation were researched using several in silico resources. Table 3.7 contains a combined summary of several in silico resources for all nominally significant candidate germline risk variants identified by the three prioritisation strategies that showed familial segregation and the genes in which they arise. Evidence from the table indicates that none of the candidate risk variants identified were reported in COSMIC (database of genes somatically mutated in cancers). However, six of the genes in which germline risk variants were identified were each reported to have mutations in the COSMIC database. None of the genes were listed in the COSMIC cancer gene census. They also were not listed in the PubMeth database, which suggests there is currently no evidence of methylation of these genes in cancer. Two of the genes were reported to have gene functions that support involvement in cancer pathogenesis in NCBI. The Gene References into Functions (GeneRIF) for ABCB5 suggests a role for this gene in chemoresistance and the GeneRIF for KIF2C suggests this gene is involved in directional migration and invasion of tumour cells. A summary of the PubMed searches for the eight candidate risk genes is summarised in Table 3.8. The PubMed searches revealed previously published associations between the ABCB5, KIF2C, and PDIA2 genes and cancer. However, there is no supporting evidence for the involvement of ARHGAP39, C16orf96, ZFP69B, UVSSA and BEAN1 genes in cancer pathogenesis at this time. The single publication returned by the search strategy for the ARHGAP39 gene revealed a role for the gene as a binding partner for CNK2 which is a spatial modulator of Rac cycling during spine morphogenesis. 297 This publication did not report any association of ARHGAP39 and cancer. The PubMed search for the BEAN1 gene returned results on randomised soya trials, labelled BEAN1 and BEAN2. 298, 299 No publications were returned on the function of the BEAN1 gene or involvement in cancer pathogenesis. 75

107 Table 3.7: Summary of findings from in silico resources investigating the role of candidate germline risk variants in cancer pathogenesis Gene Genomic Variant in No. Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC mutations gene in COSMIC census ARHGAP39 8: No 246 No Developmental biology GTPase activator activity No No Signalling by Robo receptor 75 NTR receptor-mediated signalling Signalling by GPCR Signalling by Rho GTPases C16orf96 16: No 155 No.. No No ABCB5 7: No 332 No ABC-family proteins mediated ATP binding No Chemoresistance transport Xeonobiotic-transporting Transmembrane transport of ATPase activity small molecules Efflux transmembrane transporter activity ATPase activity

108 Gene Genomic Variant in No. Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC mutations gene in COSMIC census ZFP69B 1: No 0 No Gene expression DNA binding Transcription factor activity, sequence-specific DNA binding Protein binding Metal ion binding No No UVSSA 4: No 0 No Transcription-coupled RNA polymerase II core No No nucleotide excision repair binding DNA double strand break Protein binding repair BEAN1 16: No 31 No.. No No KIF2C 1: No 141 No Golgi-to-ER retrograde Microtubule motor activity No Directional transport Cell cycle Mitotic metaphase and anaphase Mitotic prometaphase Protein binding ATP binding Microtubule binding ATPase activity migration and invasion of tumour cells Vesicle-mediated transport

109 Gene Genomic Variant in No. Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC mutations gene in COSMIC census PDIA2 16: No 114 No Statin pathway Protein disulfide isomerase activity Steroid binding Protein binding Lipid binding Disulfide oxidoreductase activity No No Genomic location: chromosome:position. COSMIC: Catalogue of Somatic Mutations in Cancer database ( 134 No. mutations in COSMIC: the number of mutations reported in the gene in the COSMIC database. Cancer gene census: is the gene reported in the cancer gene census in COSMIC? The cancer gene census is a catalogue of genes for which mutations have been causally implicated in cancer. SuperPath: from Pathcards, an integrated database of human pathways and their annotations. ( 293 Human pathways are clustered into SuperPaths based on gene content similarity. GO molecular function: Gene Ontology molecular function. 294 Methylation: is the gene reported to be methylated in cancer by PubMeth? ( 295 GeneRIF: Gene References Into Functions from National Center for Biotechnology Information ( 296 Are any GeneRIF associated with cancer reported for the gene? Robo: Roundabout family of proteins. NTR: Neurotrophins. GPCR: G-protein-coupled receptors. GTPase: Guanosinetriphosphatase. ABC: ATP-binding cassette. ATP: Adenosine triphosphate. ATPase: Adenosinetriphosphatase. ER: endoplasmic reticulum.

110 Table 3.8: Summary of search results from PubMed for genes in which germline variants were identified Gene No. of publications Role of gene Selected references ARHGAP C16orf ABCB5 109 ABCB5 is a drug efflux pump associated with melanoma, colon cancer, Merkel cell carcinoma, oral squamous cell carcinoma, acute leukemia, colorectal cancer, hepatic cancer, breast cancer and osteosarcoma drug resistance. ABCB5 has also been found to be overexpressed at the transcriptional level in a number of cancer subtypes, including breast cancer, melanoma. Alterations found in ABCB5 reported in lung cancer. ZFP69B UVSSA 5 UVSSA is involved in transcription-coupled nucleotide excision repair by relieving RNA polymerase II arrest at damaged sites to permit repair of the template strand. Mutations in UVSSA associated with Cockayne syndrome group B (characterised by photosensitivity, growth failure, progressive neurodevelopmental disorder, and premature ageing but no predisposition to skin cancer) BEAN1 2.. KIF2C 55 KIF2C (also known as MCAK) is critical in the regulation of microtubule dynamics during mitosis. KIF2C is also involved in the directional migration and invasion of tumour cells and plays a role in cell proliferation. KIF2C is a gene likely to be involved in carcinogenesis

111 Gene No. of publications Role of gene Selected references PDIA2 3 Gene expression of PDIA2 found to influence the prognostic significance of TWIST (correlated with cancer invasion and metastasis in several human cancers). PDIA2 plays a role in the maintenance of endoplasmic reticulum homeostasis and endoplasmic reticulum stress-induced apoptosis. 324, 325 PubMed search was performed using a string ( gene name ) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April Abstracts were screened for relevance to the current study.

112 3.5 Discussion The filtering and prioritisation of eight germline variants generated by WES in three families were described in this chapter. Eight candidate germline risk variants were found to show nominal association with cancer and age at onset of cancer in two of the three cancer cluster families Variant filtering and prioritisation strategies The annotation results using ANNOVAR are consistent with a previous publication that reports a significant amount of DNA fragments across WES capture fall outside target regions. 256 There were slightly more synonymous variants than nonsynonymous variants, which is also consistent with previous findings. 286 SIFT and PolyPhen-2 scores from ANNOVAR annotation were used to determine if the variants were likely to have a deleterious effect on protein function. A previous study reports reasonable sensitivity for SIFT and PolyPhen-2 (69% and 68%, respectively) but low specificity (13% and 16%, respectively). 326 Therefore, both programs have a high false-positive rate and these results should be interpreted with caution and should be reported in the context of other available evidence. 326 In addition to variants reported as deleterious or tolerated by SIFT and PolyPhen-2, there are a number of variants that were not annotated with a score (unknown). In particular, 80% of variants prioritised by the rare variants strategy were filtered out because they were unknown in both databases. Although two of the prioritisation strategies identified more regulatory variants than structural variants, of the eight candidate risk variants that showed familial segregation, seven were structural, and only one was regulatory. In this study, exome sequencing combined with variant filtering and prioritisation is an efficient strategy for identifying risk alleles in cancer cluster families. 81

113 3.5.2 Association and segregation analyses of candidate risk variants in families Family segregation studies are re-emerging as an optimal way to classify extremely rare variants. 327 In this study, three assumptions were made in determining familial segregation. These assumptions did not take into account the possibility of unaffected carriers (incomplete penetrance), later onset of disease, or risk variants that occur in cases in more than one family. Therefore, some true variants may have been excluded using these assumptions. SOLAR was used to test for association of filtered and prioritised variants with both age at onset of disease and disease status. Despite efforts to filter and prioritise variants, no variants reached statistical significance after correcting for multiple testing and a nominal p-value of < 0.05 was therefore used to select variants for familial segregation analyses. The large number of variants identified and the relatively small sample size are the likely reasons that no variants reached statistical significance after correcting for multiple testing in this study. Despite these limitations, by treating each family as a separate discovery unit, it was hoped that some insight might be gained into genetic contributions to the risk of cancer in each family. Eight variants nominally associated with age at onset of cancer and cancer status were identified in two of the three cancer cluster families. The candidate risk variants identified in this study were all private variants, seen only in one family. There has been increasing awareness that rare variants of modest to large effect contribute to complex diseases and may explain a substantial proportion of missing heritability. 129 There has been, therefore, a return to family-based studies to identify rare risk variants involved in common human 133, disease. Recent sequencing studies have shown that the rate of private mutations in individuals is larger than previously expected Rare, private mutations found in families could be due to the explosion of human populations and the slowing of negative selection by improved food supplies, sanitation, vaccines 82

114 and routine health care Rare variants that are private to families could constitute a proportion of disease risk variants. 328 It is plausible that the variants found in the ABCB5, KIF2C and PDIA2 genes may be involved in the pathogenesis of cancer based on previous publications and the proposed function of the protein. Each of these genes is discussed in more detail below The ABCB5 gene The ABCB5 gene is a ATP-binding cassette (ABC) drug efflux transporter present in a number of stem cells. 332, 333 ABCB5 functions as a determinant of membrane potential and regulator of cell fusion in physiologic skin cells. 334 This gene is also expressed in clinical malignant melanoma tumours and preferentially marks CD133 + stem cell phenotype expressing tumour cells. 334 ABCB5 is a rhodamine-123 efflux transporter and marks CD133-expressing progenitor cells. ABCB5 regulates membrane potential in these progenitor cells and determines the propensity to undergo cell fusion. 334 Membrane hyperpolarisation is associated with the multidrug resistance phenotype of human cancer cells. 335 ABCB5 plays a role in multi-drug resistance of multiple malignancies including human malignant melanoma, 333, 336, 337 colon cancer, 304, 338 Merkel cell carcinoma, 305 oral squamous cell carcinoma, 306 acute leukaemia, 307 colorectal cancer, 309 hepatocellular carcinoma, 310 breast cancer, 311 and osteosarcoma. 303 Melanoma is resistant to the effects of doxorubicin, 333 a chemotherapy drug used to treat many different types of cancer. It has been proposed that the ABCB5 drug efflux function may be involved in doxorubicin resistance. 334 The variant identified in the ABCB5 gene may be phenotypically relevant to family 2 as this family has two family members affected by melanoma (Patient 2-II-2 and Patient 2-II-3), in addition to a prostate cancer case (Patient 2-I-1), and a sarcoma case (Patient 2-II-1). 83

115 The KIF2C gene KIF2C is a kinesin-like protein that functions as a microtubule-dependent molecular motor. 339 The KIF2C gene (also known as MCAK), is one of the best characterised members of the kinesin-13 family and plays an important role in microtubule dynamics during mitosis. 320 The deregulation of KIF2C induces defects in spindle assembly, chromosome congression and segregation leading to chromosome instability, one of the hallmarks of cancer. 320 KIF2C is important for the migration and invasion of tumour cells via the modulation of microtubule dynamics in the 320, 345, 346 cytoskeleton. The KIF2C gene has been identified as a tumour antigen in patients with colorectal cancer. 347 The overexpression of KIF2C associates with a more invasive and metastatic phenotype and poor prognosis for breast, gastric and colorectal cancer patients KIF2C may represent an attractive target for antigen-specific 347, 348 immunotherapies in colorectal cancer and other malignancies The PDIA2 gene The PDIA2 gene is the pancreas-specific member of the protein disulphide isomerase (PDI) family of proteins. PDIA2, as with other PDIs, has a central role as a reductase, an oxidase, an isomerase and molecular chaperone in the endoplasmic reticulum. 351 It has been proposed that PDIA2 plays a role in the production and secretion of digestive enzymes in vivo 352 and in the binding and regulation of oestrogen synthesis. 353 A higher level of PDIA2 expression was found to be associated with shorter survival time in patients whose prostate cancer expressed a high level of TWIST but not in patients whose prostate cancer expressed a low level of TWIST. 324 TWIST is an oncogene that is correlated with cancer invasion and metastasis in human cancers including breast cancer, rhabdomyosarcoma, gastric carcinomas, bladder and prostate cancer Little is known about the role of PDIA2 in prostate cancer, although lower levels of PDIA2 expression were associated with better survival. 324 Therefore, PDIA2 may promote cancer progression. 324 However, PDIA2 alone was a poor prognostic marker for prostate cancer

116 3.5.3 Conclusion In conclusion, WES data was annotated, filtered and prioritised in an attempt to identify candidate germline risk variants that may be involved in cancer or sarcoma pathogenesis in three cancer cluster families. As there is no gold standard for the filtering and prioritisation of WES data, these results represent the current state of tools, databases and knowledge of cancer biology. With the data obtained, it is not possible to determine whether the variants in the ARHGAP39, C16orf96, ZFP69B, UVSSA and BEAN1 genes are pathogenic mutations. These genes, however, become candidates that can be further tested for association with cancer in independent families and study populations. With further genetic evidence of involvement in risk of cancer, functional studies including assays of patient-derived tissue or well-established cell or animal models of gene function could be undertaken to determine the causal effect of all candidate risk variants on the cancer phenotype. 236 Due to time and budget limitations, these types of functional studies are beyond the scope of this thesis. 85

117 86

118 Chapter 4 Aim 3: A comparison of matched tumour and germline DNA from two sarcoma patients 4.1 Introduction Next Generation Sequencing (NGS) of tumour samples and matched germline samples is a powerful strategy for studying the genetic basis of cancer initiation, development, and growth. 133 The third aim of this study was to perform a matched tumour and germline analysis on two myxoid liposarcoma patients using peripheral blood genomic DNA and genomic DNA isolated from tumour tissue to identify somatic mutations Myxoid liposarcoma Myxoid liposarcomas are the second most common group of adipocytic/lipogenic sarcomas. 64 Myxoid liposarcomas are malignant tumours composed of uniform round to oval shaped primitive non-lipogenic cells and a variable number of small signet-ring cell lipoblasts. 64 The tumours typically exhibit a FUS-DDIT3 or EWSR1-DDIT3 rearrangement. 64 Myxoid liposarcomas occur most commonly in the deep soft tissue of the extremities and very rarely in the retroperitoneum. 87

119 Somatic variants A comparison of matched tumour and germline samples from a patient allows researchers to distinguish between somatic variation (< 0.01% of variants) and inherited germline variation (> 99.99% of variants). 133 Germline variants are those that exist in the germline DNA which is the source of DNA for all cells in the body. 149 A variant contained within the germline can be passed from parent to offspring. The identification of putative germline variants was the focus of Aim 2 (Chapter 3). Therefore, germline variants will not be reported in this chapter. In contrast, somatic variants are those found in the tumour DNA but not in the germline DNA. 149 Most cancers arise and evolve as a consequence of somatic mutations. 358 The characterisation of somatic mutations in cancer genomes is essential for understanding the disease and for the development of targeted therapeutics. 359 Over the last three decades, more than 600 genes have been 134, 358 shown to be somatically mutated in cancers. Molecular characterisation of somatic driver mutations allows greater understanding of biological abnormalities within cancer cells and provides information on the function of gene products, and relationships between genes and biochemical pathways. 134 Development of new therapeutic and preventative agents are dependent on the identification and modulation of these molecular targets. 134, 360 Targeted therapies for advanced lung cancer, 361 melanoma, 362 colorectal cancer, 363 and gastrointestinal stromal tumour 364 are examples that have resulted from the translation of knowledge gained from genomics. In addition to somatic variants, a comparison of matched tumour and germline DNA can also identify the absence of heterozygosity at loci in tumour DNA compared to germline DNA Loss of heterozygosity Loss of heterozygosity (LOH) is a common genetic event in cancer development. 365 LOH is a change in the polymorphic markers from a heterozygous state in the germline DNA to a homozygous state in the tumour DNA. 366 In cancers, the absence of one functional copy of a tumour suppressor gene does not affect the phenotype. However, if LOH occurs and the remaining normal copy of the tumour 88

120 suppressor gene is lost, this will result in the complete loss of the protective function of the tumour suppressor gene. LOH is known to be involved in the somatic loss of wild-type alleles in many inherited cancer syndromes such as 366, 367 retinoblastoma and hereditary breast and ovarian cancer syndromes Somatic copy number alteration In addition to distinguishing somatic and LOH variants, somatic copy number alterations (SCNA) can be identified in a tumour sample relative to the matched germline sample by comparing the normalised read depth. 368, 369 The DNA sequence copy number is the number of copies of DNA in a region of a genome. 370 Cancer progression often involves alterations in DNA copy number. 370 In humans, the normal copy number is two for all the autosomes. A copy number variation (CNV) is defined as structurally variant regions where copy number differences have been observed between two or more genomes that are larger than one kilobase (kb) in size. 371 CNVs can alter transcription of genes by changing the dosage or by disrupting proximal or distant regulatory regions. 372 SCNA, distinguished from germline CNV, play a role in activating oncogenes and inactivating tumour suppressor genes. 13 Identification of SCNA can provide valuable insights into the cellular defects that cause cancer and suggest potential therapeutic strategies. 373 SCNA and CNVs have a significant role in tumourigenesis in many cancers including gastric cancer, 374 ovarian cancer, 375 hepatocellular carcinoma, 376 testicular germ cell tumours, 377 colorectal carcinoma, 378 and bladder cancer. 379 The characterisation of focal SCNAs has led to the identification of novel cancer genes such as MYB, PAX5 and DUSP Bioinformatic assessment of matched tumour and germline samples A number of bioinformatic tools have been developed to analyse matched tumour and germline samples. Initially, these tools used algorithms that involved calling variants in the tumour and germline samples separately followed by classification using a statistical significance test or simple subtraction. 388 More recently, tools have been developed that compare the tumour and germline directly at each 89

121 locus. VarScan2 and Strelka are two calling algorithms that were specifically 368, 369, 389 designed for the joint analysis of matched tumour and germline samples. VarScan2 uses tumour and germline samples to heuristically detect sequence 368, 369 variants and classify them by somatic status (germline, somatic or LOH). Strelka utilises a novel Bayesian approach to represent continuous allele frequencies for both tumour and normal samples to efficiently identify somatic variants. 389 Using Strelka, the normal sample is represented as a mixture of diploid germline variation with noise, and the tumour sample is represented as a combination of the normal sample with somatic variation. 389 It is important to identify somatic mutations in cancer studies as these variants often play important roles in tumour development and treatment decisions Somatic mutations and drug sensitivity The identification of somatic driver mutations that arise in tumours is important in developing new cancer therapeutic targets as genetic variation influences the response of an individual to drug treatments. 390 The current treatment for most cancers includes using cytotoxic chemotherapy, which is not precisely targeted to the somatic mutations that drive malignant transformation. 390 Somatic mutations can influence tumour behaviour and clinical outcome. Therefore, therapies should be targeted to the patient s tumour genotype rather than a generic treatment. An increased understanding of somatic mutations in individual patients has the potential to make therapies safer and more effective by assisting treatment selection and dosage based on driver mutations in the tumour. The Genomics of Drug Sensitivity in Cancer database ( org/) 391 is a large dataset on drug sensitivity in cancer cells linked to genomic information to facilitate the discovery of new biomarkers of drug response. 391 The database contains information on over 250 anticancer drugs across > 1,000 cell lines. 391 Molecular markers are identified by integrating data from the Catalogue of Somatic Mutations in Cancer (COSMIC) database 134 and cell line drug sensitivity data. 90

122 4.1.4 Outline of chapter Whole exome sequencing (WES) was performed on matched tumour and germline DNA from two myxoid liposarcoma patients from the families described in Chapter 2. VarScan2 was used to identify candidate somatic variants that were confirmed using Strelka. VarScan2 was also used to identify LOH variants and SCNA events to determine regions of interest in both patients. 4.2 Methods Whole exome sequencing Tumour DNA from formalin-fixed and paraffin-embedded (FFPE) tumour samples and germline DNA from Patient 1-II-2 and Patient 2-II-1 were available to perform a matched tumour-germline analysis. DNA was extracted at the Peter MacCallum Cancer Centre in Melbourne, Australia. After microdissection of tumour material from FFPE tissue, DNA was extracted using a DNeasy Tissue kit (Qiagen) as previously described. 392 Anti-coagulated blood was processed using a Ficoll gradient. DNA was extracted from the nucleated cell product using QIAamp DNA blood kit (Qiagen). Patient 1-II-2 (Figure 4.1) is a male patient who was diagnosed with a myxoid liposarcoma at 39 years of age. Patient 2-II-1 (Figure 4.2) is a male patient who was diagnosed with a myxoid liposarcoma at 61 years old. 91

123 1-I-1 1-I-2 Key Affected male Affected female 1-II-1 * 1-II-2 Sarcoma 1-II-3 Unaffected male Unaffected female Proband 1-III-1 Sarcoma 1-III-2 * Patient selected for tumour-normal analysis Figure 4.1: Pedigree of family 1 highlighting sarcoma Patient 1-II-2 for tumour-germline comparison 2-I-1 2-I-2 Prostate Key Affected male Affected female Unaffected male * 2-III-1 2-II-1 Sarcoma 2-II-2 Melanoma 2-II-3 Melanoma 2-II-4 Unaffected female Proband * Patient selected for tumour-normal analysis Figure 4.2: Pedigree of family 2 highlighting sarcoma Patient 2-II-1 for tumour-germline comparison 92

124 Due to difficulties performing WES on older FFPE samples, 393 DNA extracted from these samples were sent to an external sequencing facility. The four samples were sequenced using Agilent SureSelect V5 Capture on the Illumina HiSeq 4000 at 60X coverage Pre-processing and quality control FASTQ files were received from Macrogen, Inc. Initial quality control (QC) reports were generated using FastQC (version ), a quality control application for high throughput sequence data. 394 FastQC reads FASTQ files and can either provide an interactive application to review the results of several different checks or create an HTML based report which can be integrated into a pipeline. 394 QC reports were generated on sequence quality, GC content, duplication levels and adapter content Adapter trimming The presence of technical sequences such as adapters in WES data can result in suboptimal downstream analyses. 395 The Illumina-specific adapter sequences were trimmed from the FASTQ files using Trimmomatic (version 0.36). 395 As Illumina sequences are paired-end, the palindrome mode was used. This mode is specifically aimed at detecting typical adapter read-through situations in which the DNA fragment is shorter than the read length and indicates adapter contamination on the end of the reads. 395 After the Illumina-specific adapters had been trimmed from the FASTQ files, the second round of QC reports were generated on the adapter trimmed data using FastQC Sequence alignment and calling The raw sequencing data was then aligned to the human genome using the Burrows-Wheeler Aligner (BWA, version 0.7.2). 396 BWA alignment was performed in two steps. In the first step, the genome was indexed to the human genome build 19 (hg19) reference sequence. In the second phase, BWA Maximal Exact Matches (BWA-MEM) was used to run the alignment to align the sequence reads to hg19. 93

125 The alignment step creates the alignment in Sequence Alignment/Map (*.sam) format. SAMtools (version 1.3.1) View 397 was used to convert the *.sam files to *.bam format to reduce the size of the data. Summary statistics were created for the *.bam files using SAMtools flagstat. 397 Index files were created for each *.bam file using SAMtools index. 397 Local realignment was performed on the *.bam files in two stages using Genome Analysis Toolkit (GATK) RealignerTargetCreator and IndelRealigner (version 3.4.0). 180 The Picard (version 2.4.1) FixMateInformation tool was used to ensure that all read entries had their mate information written correctly. The Picard MarkDuplicates tool was then used to identify duplicate reads BAM quality control A final round of QC was performed on the *.bam files using GATK DepthOfCoverage, 180 and Picard CollectInsertSizeMetrics and CollectAlignmentSummaryMetrics to determine coverage, insert size (the library portion between the adapter sequences) and alignment metrics, respectively Generate mpileup file The germline and tumour *.bam files for each patient were grouped using SAMtools mpileup. 181 Alignment records were consolidated by sample identifiers in read group header lines Somatic variant calling using VarScan2 The genotype for each sample was determined from mpileup files using VarScan2 (version 2.3.9). 368 The algorithm read the data from both tumour and germline samples simultaneously. VarScan2 employed a heuristic approach to call variants that met the thresholds for read depth, base quality, variant allele frequency, and statistical significance. 368, 369 If the genotypes did not match, the read counts were evaluated by one-tailed Fisher s exact test in a two-by-two table, comparing the number of reference-supporting reads and variant-supporting reads observed in the tumour to the numbers that were observed in the germline. 368 If the 94

126 resulting p-value met the significance threshold (default 0.10), then the variant was called somatic (if the germline matched the reference genome at that position). 368 The VarScan2 subcommand, processsomatic, was then used to create output files of somatic variants based on confidence (low confidence and high confidence). High confidence variants are classed as those with a tumour variant allele frequency > 15%, normal variant allele frequency < 5%, and a somatic p-value of < The remaining variants are classed as low confidence. VarScan2 somaticfilter was used to filter the possible false positives from the high confidence somatic mutations. Table 4.1 shows the settings used to run the somaticfilter command. Table 4.1: Parameters specified for VarScan2 somaticfilter to filter false positives from the high confidence somatic mutations Parameter Specified Minimum read depth 10 Minimum supporting reads for a variant 2 Minimum number of strands on which variant observed 1 Minimum average base quality for variant-supporting reads 20 Minimum variant allele frequency threshold 0.2 Default p-value threshold for calling variants 1 x 10 1 Bonferroni adjustments were made to the somatic p-value values from VarScan2 to correct for multiple testing. 292 The total number of variants in the mpileup files for each patient were used for the correction. The genotypes of any significant variants were visually confirmed by importing *.bam files into Integrative Genomics 183, 184 Viewer (IGV, version ) by determining the number of reads for each allele Somatic variant calling using Strelka A second somatic variant caller, Strelka (version ), 389 was used to confirm the statistically significant somatic variants called by VarScan2. The first step of somatic variant analysis using Strelka is to run preliminary configuration validation (ensure that the chromosome names match in the *.bam header and 95

127 reference genome). Template configuration files from Strelka were used in this analysis. The configuration generates a makefile that controls the analysis step. The second phase is to run the analysis using the makefile. The sorted tumour and germline *.bam files and hg19 reference sequence were used in the analysis Evidence further supporting somatic risk genes The significant somatic variants and the genes in which they arise were further examined for evidence in cancer pathogenesis using several in silico resources including COSMIC (catalogue of somatic mutations), 134 the pathway unification database (PathCards), 293 gene ontology (GO) annotations, 294 PubMeth (a database of methylation in cancer), 295 and National Center for Biotechnology Information (NCBI). 296 A PubMed search was performed using a string ( gene name ) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April Abstracts were screened for relevance to the current study Drug sensitivity The genes in which somatic mutations were identified in two sarcoma patients were searched in the Genomics of Drug Sensitivity in Cancer database ( 391 to determine whether they were known molecular targets Loss of heterozygosity variant calling using VarScan2 VarScan2 was used to call LOH variants. Similar to the somatic variant calling process, if the genotype between tumour and germline DNA did not match, the read counts were evaluated by a one-tailed Fisher s exact test. If the resulting p-value met the significance threshold (default 0.10), then the variant was called LOH (if the germline was heterozygous). Bonferroni adjustments were made to the LOH p-value values from VarScan2 to correct for multiple testing. 292 The total number of variants in the mpileup files for each patient were used for the correction. The genotypes of any significant variants were visually confirmed by importing *.bam files into IGV. 96

128 Variant annotation and filtering Statistically significant somatic and LOH variants were annotated using Annotate Variation (ANNOVAR, version 2015Jun16) 245 and Regulome database (RegulomeDB). 257 The somatic and LOH variants that reached statistical significance after Bonferroni correction were cross-referenced to the exclusion list of Fuentes Fajardo et al. (2012) (Available in the paper s Supplementary material: Table S7 gene exclusion list final ) to determine if any variants in highly polymorphic regions should be excluded Somatic copy number analysis using VarScan2 VarScan2 copynumber was applied to the tumour-germline mpileup files to create a single output file of raw SCNAs. VarScan2 copycaller was then used to adjust for GC content and make preliminary calls. The adjusted calls files were imported into R, 281 and the package DNAcopy (version ) 398 was used to perform circular binary segmentation on a per-chromosome basis to smooth and segment the raw output from VarScan2 copycaller. 370 The results of DNAcopy were plotted in R to visualise SCNA. 4.3 Results Whole exome sequencing Raw data reports from Macrogen, Inc. are summarised in Table 4.2. The GC content for an exome typically falls within the range of 49-51%. 399 Therefore, the samples show just below average %GC content, with the tumour samples showing lower %GC content than the germline samples. Three of the samples have over 90% bases with a base quality (Q) score above 20 in the Phred scale (call accuracy of 99%), except Patient 2-II-1 tumour sample, which has 88.6% bases with a Q score above 20. Pre-processing QC and adapter trimming did not result in any sequences being flagged or trimmed. 97

129 Table 4.2: Raw data summary from Macrogen Inc. for Patient 1-II-2 and Patient 2-II-1 germline and tumour samples Sample ID Total read bases (base pairs) Total reads GC(%) AT(%) Q20(%) Q30(%) Patient 1-II-2 germline 7,720,761,786 76,443, Patient 2-II-1 germline 6,079,509,766 60,193, Patient 1-II-2 tumour 6,711,209,620 66,447, Patient 2-II-1 tumour 5,022,136,524 49,724, Sample ID: sample name. Total read bases: total number of bases sequenced. Total reads: total number of reads. GC(%): GC content. AT(%): AT content. Q20(%): Ratio of reads that have Phred quality score of over 20. Q30(%): Ratio of reads that have Phred quality score of over Sequence alignment and calling Summary statistics on the trimmed *.bam files were computed using Samtools flagstat 181 and are presented in Table 4.3. The results show that both germline samples had over 99% of reads mapped, and both tumour samples had over 98% of reads mapped. Both germline samples had almost all of the mapped reads properly paired (> 98.8%). However, the tumour samples had slightly lower properly paired reads (93.68% for Patient 1-II-2 and 95.86% for Patient 2-II-1). 98

130 Table 4.3: Summary statistics generated using Samtools flagstat for Patient 1-II-2 and 2-II-1 germline and tumour samples Statistic Patient 1-II-2 germline Patient 2-II-1 germline Patient 1-II-1 tumour Patient 2-II-1 tumour Total (QC-passed reads + QC-failed reads) 76,335,490 60,113,342 63,327,174 49,372,458 Duplicates Mapped (%) 75,948,149 (99.49%) 59,803,630 (99.48%) 62,074,975 (98.02%) 48,478,673 (98.19%) Paired in sequencing 76,335,490 60,113,342 63,327,174 49,372,458 Read 1 38,167,745 30,056,671 31,663,587 24,686,229 Read 2 38,167,745 30,056,671 31,663,587 24,686,229 Properly paired 75,457,268 (98.85%) 59,445,380 (98.89%) 59,321,774 (93.68%) 47,327,518 (95.86%) With itself and mate mapped 75,833,239 59,704,772 61,514,992 47,943,636 Singletons (%) 114,910 (0.15%) 98,858 (0.16%) 559,983 (0.88%) 535,037 (1.08%) Mate mapped to a different chromosome 174, , ,018 39,240 Mate mapped to a different chromosome (mapq 5) 156,396 93,541 65,912 27,128 QC: quality control.

131 Local realignment was performed using GATK RealignerTargetCreator. For Patient 1-II-2 there were 3,793,051 (2.72%) reads filtered out during the traversal. Of these, 224,019 reads failed the bad mate filter, 3,569,018 reads failed the mapping quality zero filter, and 14 reads failed the unmapped read filter. For Patient 2-II-1 there were 3,060,049 (2.79%) reads filtered out during the traversal. Of these, 121,551 reads failed the bad mate filter, 2,938,469 reads failed the mapping quality zero filter, and 29 reads failed the unmapped read filter. For Patient 1-II-2, no reads were filtered out of 76,335,490 total reads in the germline sample, and no reads were filtered out of 63,327,174 total reads in the tumour sample. For Patient 2-II-1, no reads were filtered out of 60,113,342 total reads in the germline sample, and no reads were filtered out of 49,372,458 total reads in the tumour sample BAM quality control GATK depth of coverage results are presented in Figure 4.3. As expected, the majority of bases were covered at a depth of 100X or less in each sample. Germline samples (blue) for both patients show slightly higher coverage compared to the tumour samples (orange). 100

132 220, , , , , ,000 Number of bases 145, ,000 95,000 Number of bases 145, ,000 95,000 70,000 70,000 45,000 45,000 20,000 20,000-5,000 >=0 >=50 >=100 >=150 >=200 >=250 >=300 >=350 >=400 >=450 >=500 Depth Patient 1-II-2 Germline Patient 1-II-2 Tumour -5,000 >=0 >=50 >=100 >=150 >=200 >=250 >=300 >=350 >=400 >=450 >=500 Depth Patient 2-II-1 Germline Patient 2-II-1 Tumour (a) Patient 1-II-2 (b) Patient 2-II-1 Figure 4.3: Genome analysis toolkit depth of coverage summary for Patient 1-II-2 and Patient 2-II-1 germline and tumour DNA

133 The average insert size for the germline samples of both Patient 1-II-2 and Patient 2-II-1 is approximately 150 base pairs. The tumours samples have slightly smaller insert sizes of approximately 125 base pairs and 140 base pairs for Patient 1-II-2 and Patient 2-II-1, respectively. Figure 4.4 shows histogram plots of the insert size distribution for both patients germline and tumour samples generated by Picard (Patient 1-II-2: top panels, Patient 2-II-1: bottom panels). a) b) Patient 1-II-2 germline Patient 1-II-2 tumour c) d) Patient 2-II-1 germline Patient 2-II-1 tumour Figure 4.4: Insert size histogram plots generated by Picard for Patient 1-II-2 and Patient 2-II-1 germline and tumour samples 102

134 High level metrics about the alignment of reads within a *.bam file were produced by the CollectAlignmentSummaryMetrics tool from Picard. All the reads from both patients germline and tumour samples passed the filter criteria, and the percentage of reads aligned was above 98% for all samples Somatic variant calling VarScan2 VarScan2 identified 4,888 somatic variants in Patient 1-II-2, of which, 702 were classed as high confidence. Patient 2-II-1 had 2,667 somatic variants with 595 classed as high confidence. The results of the somaticfilter command (to remove possible false positives) for the SNV somatic high confidence files are presented in Table 4.4. Most of the variants that were removed from both patients failed the Reads2 requirement (minimum supporting reads for a variant). Table 4.4: Results from VarScan2 somaticfilter to remove possible false positives from the high confidence somatic calls for Patient 1-II-2 and Patient 2-II-1 Filter Patient 1-II-2 Patient 2-II-1 Total variants in input Coverage requirement (10) 3 7 Reads2 requirement (2) VarFreq requirement (0.2) 0 0 p-value requirement (1 x 10 1 ) 2 14 SNP clusters requirement 0 4 Near INDELs 0 0 Passed Reads2: minimum supporting reads for a variant filter. VarFreq: Minimum variant allele frequency filter. SNP: single nucleotide polymorphism. INDEL: insertion or deletion. 103

135 Bonferroni adjustment was performed on the p-values from VarScan2 to correct for multiple testing. As Patient 1-II-2 had 66,265,606 positions for comparison, the significance level after Bonferroni correction was α < 7.55 x After correcting for multiple testing, Patient 1-II-2 had 11 statistically significant somatic variants. Patient 2-II-1 had 67,054,165 positions for comparison, therefore the significance level after Bonferroni correction was α < 7.46 x After correcting for multiple testing, Patient 2-II-1 had three statistically significant somatic variants Validation of somatic variants using Strelka Of the 11 somatic variants identified by VarScan2 in Patient 1-II-2, ten were also reported as somatic variants by Strelka (Table 4.5). A variant in the CCDC66 gene (position 19: ) was reported by VarScan2 but not reported by Strelka. All three somatic variants identified by VarScan2 in Patient 2-II-1 were confirmed by Strelka (Table 4.6). The variants were annotated and cross-referenced against a provisional gene exclusion list, but no variants were removed

136 Table 4.5: Somatic variants identified by VarScan2 and Strelka for Patient 1-II-2 Chr:Pos Gene p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt chr14: PRMT x NS D B G A 93,76 chr9: ASPN 5.55 x NS T D T A 60,51 chr6: LAMA x NS T D G A 67,49 chr4: TET x NS T B C T 61,55 chr18: FHOD x NS T D A T 54,54 chr19: GATAD2A 3.78 x NS T D G A 9,17 chr14: ADSSL x S.. 2b. C T 41,34 chr3: P4HTM 5.26 x NS. B 2b A C 54,26 chr11: SLC22A20,POLA x IG.. 5. G T 21,15 chr9: ABL x S.. 5. G C 53,32 Chr:pos: Chromosome:position. p-value: Fisher s p-value. SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative GERP score represents a substitution surplus). Ref: reference allele. Alt: alternate allele. NS: nonsynonymous. S: synonymous. IG: intergenic. D: deleterious. B: benign. T: tolerated. Read depth Ref, alt

137 Table 4.6: Somatic variants identified by VarScan2 and Strelka for Patient 2-II-1 Chr:Pos Gene p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt chr8: SDR16C6P,PENK 9.39 x Intergenic.... C T 44,32 chr5: SLC6A x Splicing G A 10,12 chr5: PLK x Intronic.. 4. T C 31,23 Chr:pos: Chromosome:position. p-value: Fisher s p-value. SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative GERP score represents a substitution surplus). Ref: reference allele. Alt: alternate allele. D: deleterious. B: benign. Read depth Ref, alt

138 Evidence further supporting somatic risk genes Table 4.7 contains a summary of several in silico resources for the somatic risk variants and the genes in which they arise for both patients. None of the somatic risk variants were reported in COSMIC. However, all but one of the genes (SDR16C6P) were reported to have mutations in the COSMIC database. Two genes were listed in the COSMIC cancer gene census (TET2 and ABL1 ). The ABL1 and PENK genes were reported in the PubMeth database. However, none of the other genes were reported in PubMeth, which suggests there is currently no evidence of methylation of these genes in cancer. Evidence from NCBI suggests that ten genes have been reported to have gene functions that support involvement in cancer. Of these ten genes, six genes (PRMT5, LAMA2, TET2, FHOD3, ABL1 and PLK2 ) were reported to have Gene References into Functions (GeneRIF) evidence for involvement in cancer pathogenesis. Two genes (POLA2 and PENK) had GeneRIF that indicated these genes might be biomarkers for cancer and two genes (ASPN and P4HTM ) had GeneRIF that suggested these genes may be targets for cancer therapeutics. A summary of the PubMed searches for the candidate somatic risk genes is summarised in Table 4.8. The PubMed searches revealed previously published associations between the genes and cancer except for the ADSSL1 and SDR16C6P genes. Therefore there is no evidence supporting the involvement of ADSSL1 and SDR16C6P genes in cancer pathogenesis at this time. 107

139 Table 4.7: Summary of findings from in silico resources investigating the role of somatic risk variants and the genes in which they arise in cancer pathogenesis Gene Genomic Variant in No. Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC mutations gene in COSMIC census PRMT5 chr14: No 95 No Regulation of TP53 Core promoter No Colorectal cancer activity sequence-specific DNA pathogenesis Chromatin organisation binding Acute myeloid Gene expression Transport of the SLBP independent mature mrna RNA transport Transcription corepressor activity Protein binding Methyltransferase activity Methyl-CpG binding leukemia growth Marker of poor prognosis in nasopharyngeal carcinoma Marker for early colorectal carcinomas ASPN chr9: No 96 No ECM proteoglycans Protein kinase inhibitor No Role in gastric cancer Degradation of the activity Therapeutic target extracellular matrix Calcium ion binding molecule Collagen binding

140 Gene Genomic Variant in No. variants Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC in COSMIC gene census LAMA2 chr6: No 854 No Integrin pathway Receptor binding No Mutations in ERK signalling Arrhythmogenic Structural molecule activity hepatocellular carcinoma patients right ventricular cardiomyopathy Dilated cardiomyopathy Focal adhesion TET2 chr4: No 2,726 Yes Activated PKN1 Sulfonate dioxygenase No Involved in stimulates transcription activity leukemogenesis of androgen receptor regulated genes Chromatin regulation / Acetylation Gene expression DNA binding Protein binding Ferrous iron binding Zinc ion binding Oncogenic role in myeloid tumour FHOD3 chr18: No 393 No. Actin binding Protein binding No Glioma linear migration Associated with acute lymphoblastic leukemia Promotes invasive migration and local invasion in vivo

141 Gene Genomic Variant in No. variants Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC in COSMIC gene census GATAD2A chr19: No 95 No Activated PKN1 stimulates transcription of androgen receptor regulated genes Chromatin organisation Gene expression Regulation of TP53 activity ADSSL1 chr14: no 114 No Purine metabolism Purine nucleotides de novo biosynthesis Metabolism purine metabolism Alanine, aspartate and glutamate metabolism Contributes to RNA polymerase II regulatory region sequence-specific DNA binding Transcription factor activity, sequence-specific DNA binding Protein binding Zinc ion binding Protein binding, bridging Magnesium ion binding GTPase activity Adenylosuccinate synthase activity GTP binding Ligase activity No. No. P4HTM chr3: No 75 No. Iron ion binding Calcium ion binding Oxidoreductase activity No May aid the design of novel therapies for inhibiting bone tumours

142 Gene Genomic Variant in No. variants Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC in COSMIC gene census SLC22A20 chr11: No 83 No. Inorganic anion exchanger activity No. Sodium-independent organic anion transmembrane transporter activity POLA2 chr11: No 108 No Telomere C-strand DNA binding synthesis DNA-directed DNA E2F mediated regulation polymerase activity of DNA replication Protein heterodimerisation Regulation of activated activity PAK-2p34 by proteasome mediated degradation Purine metabolism Cell cycle, Mitotic No Prognostic biomarker in non small cell lung cancer pathogenesis

143 Gene Genomic Variant in No. variants Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC in COSMIC gene census ABL1 chr9: No 1,684 Yes DNA double-strand breakmagnesium ion binding repair Development Slit-Robo signalling Regulation of actin dynamics for phagocytic cup formation DNA binding Actin monomer binding Nicotinate-nucleotide adenylyltransferase activity Protein kinase activity Cell cycle ErbB signalling pathway Yes BCR/ABL oncogene in leukaemia Promote breast cancer osteolytic metastasis Progression of gastric cancer SDR16C6P chr8: No. No.. No. PENK chr8: No 163 No Apoptotic pathways in Opioid peptide activity Yes Promoter methylation synovial fibroblasts GPCR pathway ERK signalling Nanog in Mammalian Neuropeptide hormone activity Opioid receptor binding associated with colorectal adenocarcinoma diagnosis ESC Pluripotency CREB Pathway

144 Gene Genomic Variant in No. variants Cancer SuperPath GO Molecular function Methylation GeneRIF location COSMIC in COSMIC gene census SLC6A18 chr5: No 186 No Transport of glucose and other sugars, bile salts and organic acids, metal ions and amine compounds Amino acid transport across the plasma membrane Neurotransmitter:sodium symporter activity Amino acid transmembrane transporter activity Symporter activity No. PLK2 chr5: No 153 No FoxO signalling pathway Nucleotide binding No Promoting tumour Gene expression Protein kinase activity progression TP53 Regulates transcription of cell cycle genes DNA damage Protein serine/threonine kinase activity Signal transducer activity Protein binding Increases cell proliferation and decreases apoptosis in gastric cancer cells Regulation of TP53 activity Genomic location: chromosome:position. COSMIC: Catalogue of Somatic Mutations in Cancer database. 134 No. mutations in COSMIC: the number of mutations reported in the gene in the COSMIC database. Cancer gene census: is the gene reported in the cancer gene census in COSMIC? The cancer gene census is a catalogue of genes for which mutations have been causally implicated in cancer. SuperPath: from Pathcards, an integrated database of human pathways and their annotations. ( Human pathways were clustered into SuperPaths based on gene content similarity. GO molecular function: Gene Ontology molecular function. 294 Methylation: is the gene reported to be methylated in cancer by PubMeth? ( 295 GeneRIF: Gene References Into Functions from National Center for Biotechnology Information ( Are any GeneRIF associated with cancer reported for the gene? SLBP: stem-loop binding protein. ECM: extracellular matrix. ERK: extracellular receptor kinase. GTP: guanosine triphosphate. E2F: E2 factor. ErbB: erythroblastosis oncogene B. GPCR: G-protein-coupled receptors. ESC: embryonic stem cells. CREB: camp response element-binding protein.

145 Table 4.8: Summary of search results from PubMed for genes in which somatic variants were identified Gene No. of publications Role of gene Selected references PRMT5 156 PRMT5 is a regulator of homologous recombination-mediated double-strand break repair. PRMT5 methyltransferase activity is necessary for tumour cell proliferation and plays an important role in cancer progression by repressing the expression of key tumour suppressor genes. Mutations in PRMT5 associated with gastric cancer, oropharyngeal squamous cell carcinoma, hepatocellular carcinoma, prostate cancer, lung adenocarcinoma, lung squamous cell carcinoma, endometrial carcinoma and breast carcinoma. ASPN 33 ASPN is a secreted small leucine rich proteoglycan with known roles in ligament regulation and chondrogenesis. It is a potential mediator of metastatic progression found within the tumour microenvironment. ASPN has been shown to play a role in breast cancer, scirrhous gastric cancer, pancreas, and prostate cancer , 404, LAMA2 22 LAMA2 functionally involved in the formation of extracellular matrix and is found to be upregulated in metastatic renal cell carcinoma and during serum-induced glioma initiating cells differentiation. Downregulation of LAMA2 reported in oesophageal cancer, extracellular matrix in drug-resistant ovarian cancer cell line, hepatocellular carcinoma, and laryngeal cancer. Abnormal methylation reported in breast cancer carcinoma and colorectal cancer. LAMA2 is a candidate marker and indicator of poor prognosis for posterior fossa subgroup A epdendymal tumours

146 Gene No. of publications Role of gene Selected references TET2 664 TET2 is an epigenetic regulator which is frequently mutated or inactivated in cancer, and it has been suggested that the TET proteins may protect against abnormal DNA methylation at promoters. TET2 mutations frequently observed in myeloid, lymphoid and hematological malignancies FHOD3 11 FHOD3 involved in cancer cell migration and invasion via regulation of dynamic actin spike assembly in cells invading in vitro and in vivo. FHOD3 plays a role in glioma linear migration motility. FHOD3 was hypomethylated, overexpressed and involved in major deletions and may play a role in thyroid cancer. FHOD3 mutations in leukaemia associated with methotrexate polyglutamates accumulation. GATAD2A 5 GATAD2A is a subunit of the nucleosome remodeling and histone deacetylase complex, a chromatin-level regulator of transcription with a number of important and emerging roles in cancer biology. Knockdown of GATAD2A decreased the ability of cell proliferation and colony formation and promoted cell apoptosis in thyroid cancer cells. A variant in GATAD2A associated with susceptibility to three cancers (breast, ovarian and prostate). ADSSL P4HTM 1 P4HTM found to be hypermethylated in rhabdomyosarcoma. P4HTM silencing by promoter DNA methylation is a potential mechanism for HIF 1α stabilisation in rhabdomyosarcomas. SLC22A20 3 SLC22A20 (OAT6 ) as an uptake carrier of sorafenib, SLC22A20 is differentially methylated in hepatocellular carcinoma , 442, 443

147 Gene No. of publications Role of gene Selected references POLA2 8 POLA2 has been reported to be involved in cell proliferation by mediating DNA replication, recombination, and repair. A variant in POLA2 improves differential survivability and mortality in non-small cell lung cancer patients and could be used as a prognostic biomarker. The knockdown of POLA2 increases gemcitabine resistance in human lung cancer cells. Low mrna expression of POLA2 was prognostic of poor outcome in ovarian carcinomas. POLA2-CDC42EP2 read-through fusion transcript identified in gastrointestinal stromal tumours. POLA2 found to be overexpressed in mesothelioma. 322, ABL1 1,368 The product of the ABL1 gene is a tyrosine kinase which plays a role in cellular growth control and response to DNA damage. The BCR-ABL1 (Philadelphia chromosome) gene fusion is responsible for > 95% of chronic myeloid leukemia. Mutations in BCR-ABL1 gene have been found to be a major cause of disease progression and resistance to tyrosine kinase inhibitors in chronic myeloid leukemia patients. Methylation of the proximal promoter of the ABL1 oncogene is a common epigenetic alteration associated with clinical progression of chronic myeloid leukemia. ABL1 first identified as oncogene in leukaemia but mutations also reported in lung cancer. SDR16C6P PENK 99 PENK is a candidate tumour suppressor gene that is hypermethylated in various cancers. PENK is also a potential biomarker for prostate, colorectal and bladder cancer. Hypermethylation of PENK contributes to cell motility and adhesion. SLC6A18 1 Gain of 5p15.33 (harbouring the SLC6A18 gene) reported in non-small cell lung cancer cases

148 Gene No. of publications Role of gene Selected references PLK2 99 PLK2 plays a critical role in cell cycle and response to DNA damage. PLK2 plays a tumour suppressor role in cervical cancer, ovarian cancer, gastric cancer and hematopoietic diseases. PLK2 is involved in paclitaxel resistance in solid tumours. PLK2 phosphorylates TAp73 resulting in inhibited cell proliferation, increased apoptosis, G1 phase arrest, and decreased cell invasion. Protein kinases represent the most effective class of therapeutic targets in cancer PubMed search was performed using a string ( gene name ) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April Abstracts were screened for relevance to the current study.

149 Drug sensitivity Two of the genes identified as of interest (TET2 and LAMA2 ) were reported in the Genomics of Drug Sensitivity in Cancer database. 391 The TET2 gene showed a statistically significant association (p-value < 10 3 ) with VNLG/124 and Bexarotene in pan-cancer analysis (drug sensitivity for cell lines from all cancer types with genomic features identified from the analysis of patient tumours across multiple different cancer types). 391 While the LAMA2 gene was reported in the Genomics of Drug Sensitivity in Cancer database, none of the associations reached statistical significance. VNLG/124 is a novel mutual prodrug of all-trans-retinoic-acid (ATRA) and histone deacetylation inhibitors (HDIs). 475 TET2 mutations determine sensitivity to ATRA. ATRA has been previously shown to induce the interaction and chromatin recruitment of a novel RARβ-TET2 complex to epigenetically activate a specific cohort of target genes. 476 Wu et al. (2017) reported a novel RARβ-TET2-miR-200c-PKCζ signalling pathway that directs cancer cell state changes that may have potential therapeutic implications. 476 Bexarotene is a selective retinoid X receptors (RXR) agonist with properties overlapping ATRA. 477 Bexarotene exerts its effects in blocking cell cycle progression, inducing apoptosis and differentiation, preventing multidrug resistance, and inhibiting angiogenesis and metastasis. 477 Therefore it is a promising chemopreventive agent against cancer. 477 None of the remaining genes harbouring significant somatic mutations were listed in the Genomics of Drug Sensitivity in Cancer database as of April However, as the understanding of genes and pathways that are causally implicated in cancer grows, more therapeutics will be added to the database in the future Loss of heterozygosity variants A total of 2,075 LOH variants were identified in Patient 1-II-2 using VarScan2. Of these, 507 were high confidence. After correcting for multiple testing and removing variants in polygenic regions, 18 LOH variants were statistically significant (Table 4.9). Of these LOH variants, 16 were located on chromosome 16, and the remaining two were located on chromosome

150 There were 1,344 LOH variants identified in Patient 2-II-1 using VarScan2, with 785 categorised as high confidence. After correcting for multiple testing, no LOH variants reached statistical significance for Patient 2-II Copy number analysis The results of DNAcopy were visualised as SCNA graphs per chromosome for Patient 1-II-2 (Appendix I) and Patient 2-II-1 (Appendix J). The SCNA graphs for Patient 1-II-1 show a considerable disruption on chromosome 16. The SCNA graphs for Patient 2-II-1 do not show any large regions of disruption. 119

151 Table 4.9: Statistically significant high confidence loss of heterozygosity variants for Patient 1-II-2 Chr:Pos Gene Somatic p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt Read depth (Ref, alt) MAF 1000G chr16: JPH x S.. 5. C T 132, chr16: POLR2C 1.22 x Int.... C T 11, chr16: DNAAF x Int.. 1f. G T 6, chr16: ZNF778, ANKRD x IG T P 3a -2 C T 9, chr16: KLHL x S.. 1f. C T 44, chr16: CMTR x S.... C T 10, chr19: CEP x NS T P 2b 2.25 T G 52,0. chr16: ZFP x S.. 3a. A G 82, chr16: PLCG x S.. 3a. C T 82, chr16: CHD x S.... G A 88, chr16: GALNS 5.52 x S.. 4. C T 1, chr16: JPH x S.... T C 6, chr19: CEP x NS T B C T 34,0. chr16: RBL x Int.. 6. T C 3, chr16: WWP x S.. 4. C T 84, chr16: LPCAT x NS T B G A 68, chr16: PLEKHG x Int.. 1f. G A 7, chr16: ZNF x UTR A G 70, Chr:Pos: Chromosome:Position. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative GERP score represents a substitution surplus). Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000 Genomes Project. S: synonymous. NS: nonsynonymous. Int: intronic. IG: intergenic. UTR3: 3 untranslated region. D: deleterious. T: tolerated. B: benign. P: possibly damaging.

152 4.4 Discussion In summary, ten somatic variants in Patient 1-II-2 and three somatic variants in Patient 2-II-1 were identified by VarScan2 and confirmed by Strelka. VarScan2 also identified a large region of LOH on chromosome 16 in Patient 1-II-2. This LOH region was supported by the SCNA results which also indicated a region of SCNA on chromosome 16. Of the somatic mutations identified, two were listed in the Genomics of Drug Sensitivity in Cancer database, indicating the potential clinical utility of these findings. Of the 13 genes in which somatic mutations were identified, 11 genes have been previously associated with cancer in published literature Comparison of results in the context of published literature on myxoid liposarcoma genetics The majority of myxoid liposarcomas are characterised by the presence of the reciprocal chromosomal translocation t(12;16)(q13;p11). This translocation creates the FUS-DDIT3 chimeric gene. 64 A smaller fraction of myxoid liposarcoma cases harbour a similar variant translocation and gene fusion, the t(12;22)(q13;q12), which fuses the EWSR1 gene to the DDIT3 gene. 478 It is likely that these translocations are the primary genetic event essential for tumour formation. 479 However, in solid tumours, single base substitutions outweigh the number of chromosomal translocations by at least one order of magnitude. 16 Therefore, it is possible that sarcomas with fusion gene drivers may also harbour other driver gene mutations. 479 Myxoid liposarcoma can contain several additional molecular genetic alterations, including TP53, PIK3CA, and TERT mutations, which directly influence tumour cell biology and may be involved in round cell transformation, migration capacity, and differential response to drugs Alterations of the TP53 pathway have 480, 486, 487 also been described in myxoid liposarcoma. One study has previously performed a matched tumour and germline analysis on myxoid liposarcoma tumours. Joseph et al. (2014) performed WES on eight fresh frozen surgically resected myxoid liposarcomas and matched blood samples. 488 A median of 10.8 (range 3 15) somatic mutations per tumour were reported, 121

153 consistent with the findings of this study (ten somatic mutations reported in Patient 1-II-2 and three in Patient 2-II-1). One somatic variant was reported by Joseph et al. in FHOD3 gene (g.chr18: g>t). 488 However, this is a different FHOD3 variant to the variant reported in this study. A PubMed search was performed (May 2017) using a string ( gene name ) AND ( myxoid liposarcoma ) for each of the genes in which somatic mutations were identified. No results were returned for any genes except ABL1. It has previously been suggested that ABL1 may play a role in pre- and post-transcriptional regulatory networks that contribute to sensitivity to trabectedin treatment in myxoid liposarcoma patients. 489 The other genes in which somatic mutations were identified in Patient 1-II-2 and Patient 2-II-1 have not been previously reported in myxoid liposarcomas. A cluster of 16 LOH variants on chromosome 16q was also identified in Patient 1-II-2. The SCNA plots for Patient 1-II-2 also highlight a region of SCNA on chromosome 16 which suggests that this may be the site of a significant genomic disruption in this patient. Of the 1,015 genes in this chromosomal region, 66 genes have previously been associated with cancer. Patient 1-II-2 had a LOH mutation in one of the cancer genes located in the region of LOH on chromosome 16, RBL2, at position chr16: (rs ). The minor allele frequency (MAF) in 1000 Genomes Project European population is Therefore, this is a common variant in the general population. The RegulomeDB score for rs is 6, which indicates there is minimal binding evidence at this position. As this is an intronic variant, we do not know the effect of LOH at this position on the phenotype. All other patients in family 1 are also heterozygous at this position, except Patient 1-I-2 (unaffected) who is a homozygous reference (Figure 4.5). Patient 1-III-1 (Ewing s sarcoma) is also heterozygous at this position in the germline DNA, however, without tumour sample for Patient 1-III-1 it is not possible to determine if this variant also becomes homozygous for the alternate allele in the tumour DNA at this position. 122

154 1-I-1 1-I-2 Key Affected male Affected female 1-II-1 * 1-II-2 Sarcoma 1-II-3 Unaffected male Unaffected female 1-III-1 Sarcoma Patient 1-III-2 Genotype at position in RBL2 gene Proband * Patient selected for tumour-normal analysis Read depth Ref, alt Patient 1-I-1 germline T/C 52,71 Patient 1-I-2 germline T/T 115,0 Patient 1-II-1 germline T/C 42,47 Patient 1-II-2 germline T/C 77,60 Patient 1-II-2 tumour C/C 3,60 Patient 1-II-3 germline T/C 70,46 Patient 1-III-1 germline T/C 55,39 Patient 1-III-2 germline T/C 34,55 Figure 4.5: Pedigree of family 1 indicating genotypes for each patient at chr16: (rs ) in the RBL2 gene 123

155 The Retinoblastoma-Like 2 (RBL2 ) gene, also known as RB2 or p130, is a tumour suppressor gene that has been implicated in endometrial cancer, intraocular melanoma, 496, 497 lung cancer, , 508 nasopharyngeal cancer, neuroblastoma, and retinoblastoma The Retinoblastoma (Rb) protein family plays an important role in regulating other cellular processes, such as terminal differentiation and senescence. 528 Previous studies have also shown that Rb proteins are differentially regulated during adipogenic differentiation of pre-adipocyte cell lines, 529, 530 suggesting that an absence of RB1 or RB2 may promote adipogenesis. 531 Human bone marrow-derived mesenchymal stromal cells (hmscs) are multipotent cells that, under defined conditions, can differentiate into multiple connective tissue cell types, such as adipocytes, osteoblasts, chondrocytes, and myoblasts. 532 Differentiation of hmscs into different lineages involves complex regulation and transcriptional activation or repression of a vast number of genes, and disruption of this regulation can 533, 534 have severe pathological consequences, such as cancer development. A second cancer gene of interest in the region of LOH on chromosome 16 in Patient 1-II-2 is the fused in sarcoma (FUS) gene, although no significant variants were reported by VarScan2 in this gene. The FUS gene is involved in the specific translocation of myxoid liposarcomas (t(12;16)(q13;p11)). 535 This translocation fuses exons 5, 7, or 8 of FUS gene with exon 2 of the DDIT3 gene. The FUS gene, also known as translocated in liposarcomas (TLS), is involved in pre-messenger ribonucleic acid (mrna) splicing and the export of fully processed mrna to the cytoplasm. 536 This protein belongs to the FET family of RNA-binding proteins (consisting of FUS, EWS and TAF15) which have been implicated in cellular processes that include regulation of gene expression, maintenance of genomic integrity and mrna/microrna processing. 537 FET genes are directly involved in deleterious genomic rearrangements, primarily in sarcomas and leukaemia. 538 Given that Patient 1-II-2 was diagnosed with a myxoid liposarcoma (a tumour derived from primitive cells that undergo adipose differentiation), the region identified on chromosome 16 may be significant. Chromosome 16 shows a vast region of LOH that encompasses both the RBL2 and FUS genes, as well as 64 other known cancer genes and numerous SCNA events that may contribute towards tumour pathogenesis in this patient. 124

156 4.4.2 Strengths A strength of the current study is the confirmation of statistically significant somatic variants using a second, independent variant caller. Many cancer sequencing studies have relied on a single calling pipeline to generate candidates. However, there is an imperfect consensus between different callers; therefore the results from a single caller should not be over-interpreted. 539 Each caller algorithm has different weaknesses, and VarScan2 has a tendency to return a very high total number of reported calls, which indicates a low specificity. 540 Ideally, more than one algorithm with different biases may reduce the number of false positives. 539 Therefore, statistically significant somatic variants called by VarScan2 were validated using a second somatic variant caller, Strelka. Of the 14 statistically significant somatic variants called by VarScan2 (11 in Patient 1-II-2 and three in Patient 2-II-1), 13 were also called by Strelka (93%). As these somatic variants have been called by two independent callers, it is less likely that these are false positive results. Despite these somatic variants being called by two independent somatic variant callers, these variants should be validated using Sanger sequencing. However, this was beyond the scope of the current project Limitations The analysis of matched tumour and germline data has several unique challenges including accounting for heterogeneity from subclonal variation and sample impurity. 148, 541, 542 The nature of cancer tissue makes somatic variant calling a challenging task. 540 The tumour DNA for this analysis was extracted from FFPE samples collected > ten years earlier. It is hard to determine the tumour purity and heterogeneity from DNA extracted in this manner as it is impossible to verify whether the block contained a mixture of tumour and adjacent normal tissue, or whether the tumour contained heterogeneous cell populations. Therefore, the tumour purity and effects of heterogeneity could not be taken into account in this analysis but should be considered in future studies. 125

157 Other issues that arise from using FFPE samples for WES are artefacts such as fragmentation and artificial base alterations FFPE samples can be a good resource for discovery of biomarkers in cancer using WES, but fresh frozen tissue is preferred as it minimises the damage to nucleotides. 543 There are also sources of error from mapping and sequencing processes. In general, data generated on the Illumina platform have increased error rates at the end of reads, a tendency towards transversion base call errors, a low INDEL error rate, and systematic sequence-specific errors following inverted repeat sequences and GG motifs The matched tumour and germline comparisons were only performed on two of the five sarcoma cases from the three cancer cluster families described in Chapter 2. A clearer picture of the full somatic mutation burden in the three cancer cluster families could be achieved by performing a matched tumour and germline analysis for all sarcomas and other cancers in these families. However, tumour DNA was not available for these patients Summary In summary, 13 novel somatic mutations were identified in two myxoid liposarcoma patients. Two of the genes in which somatic mutations were identified (FHOD3 and ABL1 ) have been previously associated with myxoid liposarcoma in the literature. A large region of LOH and SCNA on chromosome 16q that includes the genes FUS and RBL2 was reported in Patient 1-II-2, which suggests that this chromosomal region may contribute towards tumour pathogenesis in this patient. The genes in which somatic and LOH variants were identified are candidates for further investigation. Independent experimental validation should be performed to screen additional myxoid liposarcomas for variants in these candidate genes. Further functional studies could be carried out to determine the role of these variants or genes in myxoid liposarcoma pathogenesis. Due to time and budget limitations, these types of functional studies are beyond the scope of this thesis. 126

158 Chapter 5 Aim 4: Variant burden analyses at candidate risk loci in sarcoma cases and healthy ageing controls 5.1 Introduction In genetic studies of complex human disease, like cancers, the validation of candidate risk variants is an important and often rate-limiting step Existing single variant association tests are underpowered for validating rare risk variants unless sample or effect sizes are large. 554, 555 A more robust approach involves combining information across variants in a target region, such as a gene. 556 Burden tests use methods that combine rare and common variants across a gene/target region and compare an aggregate statistic between cases and controls. 272, 557, 558 A simple approach is to summarise the genotype information by counting the number of minor alleles across all variants in the target region. 556 In this chapter the candidate risk loci identified in Chapter 3 and Chapter 4 will be assessed by a case-control variant burden analysis to evaluate the full mutational burden of these regions. 127

159 5.1.1 Variant burden analyses in sarcoma cohorts A case-control rare variant burden analysis has previously been performed using sarcoma cases from the International Sarcoma Kindred Study (ISKS) cohort. 178 Targeted exon sequencing was performed on 72 genes associated with increased cancer risk in 1,162 sarcoma cases (including 966 from the ISKS) and 6,545 Caucasian controls. Ballinger et al. found an excess of pathogenic germline variants (combined odds ratio (OR) = 1.43, 95% confidence interval = , p-value < ) with approximately half of the sarcoma cases found to have putatively pathogenic monogenic and polygenic variation in known and novel cancer genes. 178 This study found a measurable contribution of polygenic effects to sarcoma risk by rare variant burden analysis of cases and controls. 178 A variant burden analysis was also performed using 175 Ewing s sarcoma patients from the International Cancer Genome Consortium (100 patients) and Pediatric Cancer Genome Project (19 patients). 559 Pathogenic and likely pathogenic mutations were found in 13.1% of Ewing s sarcoma cases, which is significantly higher compared to the same genes in the Exome Aggregation Consortium (ExAC) database (53,105 subjects). 559 Brohl et al. found pathogenic mutations were highly enriched for genes involved in DNA damage repair and cancer predisposition syndromes. 559 A table of genes identified by Ballinger et al. and Brohl et al. can be found in Appendix L. 5.2 Methods Study participants Sarcoma cases (561) were selected from the ISKS, 175 described previously in Chapter 2. Briefly, the ISKS was initiated in 2008 and is a global resource for researchers to investigate the hereditary characteristics of sarcoma. 175 Patients with sarcoma were recruited from major sarcoma treatment centres across Australia, France, New Zealand, India, the United States of America (USA), the United Kingdom, and Canada, regardless of their family history of cancer. 175 Individuals with adult-onset sarcoma (> 15 years old) were eligible for the ISKS. 128

160 A total of 1,144 healthy ageing cancer-free controls were selected from the Medical Genome Reference Bank (MGRB) program. 560 The MGRB program is a collaborative project between the New South Wales State Government and the Garvan Institute of Medical Research to sequence healthy, older individuals to create a high quality database that is depleted of damaging genetic variants. 560 The MGRB program utilises participants from an existing cohort, the ASPirin in Reducing Events in the Elderly (ASPREE) Study. 561 The ASPREE Study is an international clinical trial to determine whether daily low-dose aspirin improves the quality of life for 19,000 older people in Australia and the USA Whole genome sequencing Whole genome sequencing (WGS) for ISKS cases and MGRB controls was performed by collaborators at the Garvan Institute for Medical Research. Cases and controls were sequenced at one lane per sample on the Illumina HiSeq X Ten platform using TruSeq Nano chemistry (2 x 150 base pair paired-end reads, > 30X mean depth for all samples). Samples passing FastQC 394 and verifybamid 562 contamination filters were mapped to the 1000 Genomes Project hs37d5 reference 563 with an additional PhiX decoy, and small variants called using the Genome Analysis Toolkit (GATK) 3.7 best practices pipeline. 564 The hs37d5 reference is the hg19-based reference genome employed by the 1000 Genomes Project for Phase 3 analysis. This genome differs from the hg19 genome due to the inclusion of 35 Mb of human sequence that is included as an additional contig (hs37d5). Variants passing variant quality score recalibration (VQSR) tranche thresholds of 99.5% (single nucleotide polymorphisms) and 99.0% (insertions and deletions) were retained to summarise frequencies Genomic regions selected for validation Table 5.1 contains eight target regions that were identified in Chapter 3 (ABCB5, ARHGAP39, BEAN1, C16orf96, KIF2C, PDIA2, UVSSA and ZFP69B). These target regions are genes in which germline risk variants segregating with cancer and age at onset of cancer in three cancer-cluster families were identified using whole exome sequencing (WES). 129

161 Table 5.1: Genomic coordinates for target regions in which germline and somatic risk variants were identified Target region Chromosome Start coordinate End coordinate Identified in Chapter 3 KIF2C 1 45,204,490 45,234,438 ZFP69B 1 40,915,337 40,930,390 UVSSA 4 1,340,104 1,382,837 ABCB5 7 20,654,245 20,797,637 ARHGAP ,753, ,839,888 BEAN ,460,200 66,517,745 C16orf ,605,491 4,651,318 PDIA , ,209 Identified in Chapter 4 P4HTM 3 49,026,341 49,045,581 TET ,066, ,201,960 PLK2 5 57,748,810 57,756,966 SLC6A18 5 1,224,470 1,247,304 LAMA ,203, ,838,710 SDR16C6P,PENK 8 57,286,277 57,359,593 ABL ,588, ,764,062 ASPN 9 95,217,489 95,245,844 SLC22A20,POLA ,980,311 65,066,088 ADSSL ,195, ,214,647 PRMT ,388,733 23,399,661 FHOD ,876,702 34,361,018 GATAD2A 19 19,495,642 19,620,741 Genomic coordinates for each target region (± 1,000 bases) based on human genome 19 (hg19) were obtained from the University of California Santa Cruz (UCSC) Genome Browser (

162 The additional 13 target regions listed in Table 5.1 were identified in Chapter 4; (ABL1, ADSSL1, ASPN, FHOD3, GATAD2A, LAMA2, P4HTM, PLK2, PRMT5, SLC6A18, TET2, two target regions encompassing SDR16C6P and PENK, SLC22A20, and POLA2 ). These target regions are genes in which candidate somatic risk variants were identified by a matched tumour and germline analysis in two myxoid liposarcoma patients. For intergenic variants, both flanking genes were included. Genomic coordinates for each target region were obtained from the University of California Santa Cruz (UCSC) Genome Browser ( 565 using human genome build 19 (hg19) and included 1,000 bases either side of each target region. Frequency summary files for the target regions for both case and controls (in variant call format (*.vcf)) were received and annotated using Annotate Variation 245, 257 (ANNOVAR, version 2015Jun16) and Regulome database (RegulomeDB) Statistical analyses Using the annotation from ANNOVAR, the number of nonsynonymous and deleterious alleles (defined as deleterious in both Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2)) 266, 267 and normal alleles in each target region were summed in cases and controls. As deleterious alleles were defined as being deleterious in both SIFT and PolyPhen-2, this was a more conservative approach. 566 The number of putative regulatory alleles (defined as those with a RegulomeDB score of 1a, 1b, 1c, 1d, 1e, 1f, 2a, 2b or 2c) and normal alleles in each target region were summed in cases and controls. Table 5.2 shows the classification of scores from RegulomeDB. Odds ratios (ORs) and p-values reported for variant burden analysis were obtained from one-sided Fisher s exact tests performed in R 281 to compare the total burden of deleterious and putative regulatory variants, separately, in cases and controls, a method used previously by Ballinger et al. (2016). 178 Bonferroni adjustment was performed to correct for multiple testing

163 Table 5.2: Classification of Regulome database scores Score 1a Supporting data eqtl + TF binding + matched TF motif + matched DNase Footprint + DNase peak 1b eqtl + TF binding + any motif + DNase Footprint + DNase peak 1c 1d 1e 1f 2a 2b 2c 3a 3b eqtl + TF binding + matched TF motif + DNase peak eqtl + TF binding + any motif + DNase peak eqtl + TF binding + matched TF motif eqtl + TF binding / DNase peak TF binding + matched TF motif + matched DNase Footprint + DNase peak TF binding + any motif + DNase Footprint + DNase peak TF binding + matched TF motif + DNase peak TF binding + any motif + DNase peak TF binding + matched TF motif 4 TF binding + DNase peak 5 TF binding or DNase peak 6 Other eqtl: Expression Quantitative Trait Loci. TF: Transcription Factor. DNase: Deoxyribonuclease. 132

164 5.3 Results Identification of nonsynonymous deleterious variants in the target regions The results of the annotation of the frequency summary files using ANNOVAR and RegulomeDB are summarised in Table 5.3. On average, 1,128 variants were identified in each gene in ISKS cohort and 2,282 in the MGRB cohort. Each gene had an average of five nonsynonymous deleterious variants in the ISKS cohort and six nonsynonymous deleterious variants in the MGRB cohort. The ISKS cohort had an average of 12 putative regulatory variants per gene compared to 11 per gene in the MGRB cohort. Table 5.3: Annotated summary of nonsynonymous deleterious variants and putative regulatory variants in the target regions Total variants Deleterious variants Regulatory variants Target region ISKS MGRB ISKS MGRB ISKS MGRB KIF2C ZFP69B P4HTM TET UVSSA PLK SLC6A LAMA2 6,431 7, ABCB5 2,117 22, ARHGAP39 1,085 1, SDR16C6P,PENK 842 1, ABL1 2,173 2, ASPN

165 Total variants Deleterious variants Regulatory variants Target region ISKS MGRB ISKS MGRB ISKS MGRB SLC22A20,POLA ADSSL PRMT BEAN C16orf PDIA FHOD3 5,215 5, GATAD2A 1,456 1, ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank. Deleterious variants: defined as nonsynonymous variants that are deleterious in both Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2). Regulatory variants: defined as variants with a Regulome database score < 3. Number of variants corresponds to the number of deleterious or regulatory variants within each target region Statistical analyses Nonsynonymous deleterious variants Table 5.4 shows the number of nonsynonymous deleterious alleles and normal alleles for each target region for cases and controls and the results of Fisher s exact test. The significance level after Bonferroni correction was α < 2.38 x A table containing each nonsynonymous deleterious variant for each target region that was included in the variant burden test is located in Appendix K. 134

166 Table 5.4: Odds ratios, p-values and 95% confidence intervals from Fisher s exact test for target regions for nonsynonymous deleterious variants Target region Chr. Identified as Odds ratio p-value 95% CI KIF2C 1 Germline 0 1. ZFP69B 1 Germline P4HTM 3 Somatic UVSSA 4 Germline TET2 4 Somatic x PLK2 5 Somatic 0 1. SLC6A18 5 Somatic x LAMA2 6 Somatic ABCB5 7 Germline ARHGAP39 8 Germline SDR16C6P,PENK 8 Somatic ABL1 9 Somatic ASPN 9 Somatic SLC22A20,POLA2 11 Somatic ADSSL1 14 Somatic PRMT5 14 Somatic BEAN1 16 Germline 0 1. C16orf96 16 Germline x PDIA2 16 Germline x FHOD3 18 Somatic GATAD2A 19 Somatic Chr: Chromosome. ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank. CI: Confidence interval. Odds ratios, p-values and 95% CI obtained from Fisher s exact test performed in R. 135

167 Four target regions reached statistical significance after correction for multiple testing (C16orf96, PDIA2, SLC6A18 and TET2 ). Of these, C16orf96 and PDIA2 were initially identified as germline variants in three cancer cluster families, and SLC6A18 and TET2 were identified as somatic variants from a matched tumour-germline analysis in two myxoid liposarcoma cases. The odds ratios in Table 5.4 indicate a higher burden of nonsynonymous deleterious alleles in sarcoma cases compared to controls for C16orf96, SLC6A18 and TET2. However, the odds ratio for PDIA2 suggests that controls have a higher burden of variant alleles compared to the sarcoma cases Putative regulatory variants Table 5.5 shows the number of putative regulatory alleles and normal alleles for each target region for cases and controls and the results of Fisher s exact test. The significance level after Bonferroni correction was α < 2.78 x A table containing each putative regulatory variant for each target region that was included in the variant burden test is located in Appendix M. 136

168 Table 5.5: Odds ratios and p-values from Fisher s exact test for target regions for putative regulatory variants Target region Chr. Identified as Odds ratio p-value 95% CI KIF2C 1 Germline ZFP69B 1 Germline... P4HTM 3 Somatic UVSSA 4 Germline x TET2 4 Somatic x PLK2 5 Somatic SLC6A18 5 Somatic LAMA2 6 Somatic x ABCB5 7 Germline x ARHGAP39 8 Germline x SDR16C6P,PENK 8 Somatic ABL1 9 Somatic x ASPN 9 Somatic... SLC22A20,POLA2 11 Somatic x ADSSL1 14 Somatic PRMT5 14 Somatic... BEAN1 16 Germline C16orf96 16 Germline x PDIA2 16 Germline FHOD3 18 Somatic x GATAD2A 19 Somatic Chr: Chromosome. ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank. CI: Confidence interval. Odds ratios, p-values and 95% CI obtained from Fisher s exact test performed in R. 137

169 Nine target regions reached statistical significance after correction for multiple testing (ABCB5, ARHGAP39, C16orf96, UVSSA, ABL1, FHOD3, LAMA2, TET2 and a region encompassing SLC22A20 and POLA2 ). Of these, ABCBC5, ARHGAP39, C16orf96, UVSSA were identified as germline variants in three cancer cluster families, and ABL1, FHOD3, LAMA2, TET2 and a region encompassing SLC22A20 and POLA2 were identified as somatic variants from a matched tumour-germline analysis in two myxoid liposarcoma cases. The odds ratios indicate a higher burden of putative regulatory variants in sarcoma cases compared to controls for ARHGAP39, ABL1 and a region encompassing SLC22A20 and POLA2. However, the odds ratio for ABCB5, C16orf96, UVSSA, FHOD3, LAMA2, and TET2 indicates that controls have a higher burden of variant alleles compared to the sarcoma cases. 5.4 Discussion A total of six target regions of interest (C16orf96, SLC6A18, TET2, ARHGAP39, ABL1 and a region encompassing SLC22A20 and POLA2 ) were found to have a higher burden of nonsynonymous deleterious variants or putative regulatory variants in 561 sarcoma cases compared to 1,144 healthy ageing controls Novel findings This is the first study to report associations between the C16orf96, SLC6A18, ARHGAP39, POLA2 and SLC22A20 genes and sarcoma. None of these genes were reported by Ballinger et al. or Brohl et al. in their variant burden analyses 178, 559 in sarcoma cohorts. C16orf96 is an open reading frame gene on chromosome 16 that is an uncharacterised protein coding gene. The function of C16orf96 is currently unknown, and expression is generally low in cells. In situ hybridisation experiments have shown C16orf96 RNA expression is low in testis and skin only and not present in other tissue types. 567 The function of this gene or any potential role for this gene in cancer pathogenesis has not been established. 138

170 The SLC6A18 gene is a member of the SLC6 specific transporter family. SLC6A18 is involved in the transport of glucose and other sugars, bile salts and organic acids, metal ions and amine compounds. A previous study reported a gain of region 5p15.33 containing SLC6A18 in small cell lung cancers. 463 Copy number variations in SLC618A have also been reported in lung adenocarcinoma. 568 The protein encoded by ARHGAP39 is a binding partner for CNK2 that is a spatial modulator of Rac cycling during spine morphogenesis and signalling by G protein coupled receptors (GPCR). 297 There is no supporting evidence for a role for ARHGAP39 in cancer pathogenesis at this time. SLC22A20 is a member of the solute carrier family that plays a role in inorganic anion exchanger activity. SLC22A20 is differentially methylated in hepatocellular 321, 442, 443 carcinoma and may be used as a biomarker for early detection. The POLA2 gene has been reported to be involved in cell proliferation by mediating DNA replication, recombination, and repair. 444 A variant in POLA2 has been found to improve differential survivability and mortality in non-small cell lung cancer patients and could be used as a prognostic biomarker. 445, 448 Low mrna expression of POLA2 was found to be prognostic of poor outcome in ovarian carcinomas. 446 Additionally, POLA2 was found to be overexpressed in mesothelioma. 449 The role of C6orf96, SLC6A18, ARHGAP39, SLC22A20 and POLA2 in sarcoma pathogenesis remains to be elucidated. The results of this study should prioritise further research on these genes in sarcomas Known cancer genes Both the TET2 and ABL1 genes are known cancer genes listed in the Catalogue of Somatic Mutations in Cancer (COSMIC) cancer gene census. 134 TET2 is reported to be frequently mutated or inactivated in cancer and mutations are commonly observed in myeloid, lymphoid and haematological malignancies TET2 has previously been associated with sarcomas. The loss of TET2 is a 569, 570 characteristic of myeloid sarcomas and may be used as a novel marker. 139

171 ABL1 is a proto-oncogene that encodes a protein tyrosine kinase involved in a variety of cellular processes, including cell division, adhesion, differentiation, and response to stress. 571 This gene is known to be fused to a variety of translocation partner genes in various leukaemias, for example, chronic myelogenous leukaemia (BCR-ABL1 ). 572 ABL kinases may also play a role in solid tumours including breast, colon, lung and kidney carcinomas, and melanoma ABL1 variants have previously been reported in sarcomas. Two patients with chronic myeloid leukaemia and secondary sarcomas (histiocytic sarcoma and segregated extramedullary (nodal) myeloid sarcoma) were found to be positive for the t(9;22) BCR/ABL1 translocation in the sarcoma tumours. 583, 584 This evidence suggests that the lineages may be clonally related. 583, 584 However, there is no evidence of ABL1 variants in sarcoma cases without chronic myeloid leukaemia Clinical implications Three of the regions of interest identified in this study may have clinical implications in the treatment of sarcomas. TET2 is listed in the Genomics of Drug Sensitivity in Cancer database and shows a statistically significant association (p-value < 10 3 ) with VNLG/124 and Bexarotene. 391 There may be myeloid sarcomas among the ISKS cases sequenced in this study that harbour TET2 mutations and may respond to VNLG/124 or Bexarotene. However, there may also be other sarcoma subtypes harbouring TET2 mutations. The role of TET2 in sarcoma subtypes other than myeloid sarcomas and treatment of sarcomas with TET2 variants using VNLG/124 and Bexarotene should be further investigated. ABL1 is associated with trabectedin sensitivity in myxoid liposarcomas. 489 Therefore, there may be an opportunity to treat other sarcoma subtypes that exhibit ABL1 variants with trabectedin. An expanded access program tested trabectedin in patients with incurable soft tissue sarcoma following the progression of disease with standard therapy. 585 Results of the study demonstrated disease control despite a low incidence of objective responses in advanced soft tissue sarcoma patients after failure of standard chemotherapy. 585 The study also found greater clinical benefit rate and longer median overall survival in patients with 140

172 leiomyosarcoma and liposarcoma compared with patients with histopathologic subsets of sarcomas other than leiomyosarcoma and liposarcoma. 585 A second study that evaluated the effectiveness of trabectedin for patients with soft tissue sarcoma also found there may be a benefit in using trabectedin in patients with leiomyosarcoma or liposarcoma who failed standard of care agents. 586 The SCL22A20 gene offers some interest and potential clinical utility as an uptake carrier of sorafenib, a multikinase inhibitor. 442 Sorafenib has been shown 587, 588 to have activity in metastatic soft tissue sarcoma, specifically in leiomyosarcoma Strengths and limitations Classic single-marker association analysis for rare variants are underpowered unless the sample size is extremely large, or the variants have a large effect size. 558, 589 Consequently, burden tests for the analysis of rare genetic variants have been developed that consider their joint effects on complex traits within the same functional unit or genomic region. The burden test makes assumptions that all variants in a region are causal and associated with a trait in the same direction and magnitude of effect. 590 Violation of these assumptions can reduce the power of the test For the variants identified in a genomic region by WES and WGS, like in this study, some variants will have little or no effect on the phenotype, some variants may be protective, and some may be deleterious. The magnitude of the effect of each variant may also vary. For example, rare variants may have a larger effect compared to common variants. Some burden tests, for example, sequence kernel association tests, take violations of these assumptions into consideration. 592 However, as only frequency summary files for each cohort were available, the breach of these assumptions could not be addressed at this time. 141

173 There were also several regions of interest that were identified to have a higher mutational rate in controls compared to cases. PDIA2 was found to have a higher rate of nonsynonymous deleterious variants in controls compared to cases. ABCB5, C16orf96, UVSSA, FHOD3, LAMA2 and TET2 were found to have a higher rate of putative regulatory variants in controls compared to cases. This may be due to the presence of common minor alleles in the general population (see Appendix K for minor allele frequencies (MAF) for each variant) or the presence of variants that are phenotypically neutral. Two of the regions of interest (TET2 and C16orf96 ) were found to have a higher mutational rate of nonsynonymous deleterious variants in cases compared to controls, but a higher mutational rate of putative regulatory variants in controls. This may also be due to the presence of common minor alleles classified as putative regulatory variants (see Appendix M for MAF for each variant). For example, two putative regulatory variants in C16orf96 have a MAF of 0.61 and 1.00 in the general population. Therefore, these may be phenotypically neutral variants. Whereas the nonsynonymous deleterious variants in C16orf96 had MAF < 2%. Likewise, TET2 nonsynonymous variants had MAF < 2% whereas one putative regulatory variant had a MAF of Due to these findings of higher mutational rates in controls compared to cases and contradictory findings between nonsynonymous deleterious variants and putative regulatory variants for C16orf96 and TET2, further studies are required to confirm these gene-level associations Conclusion In conclusion, six target regions that were identified by WES in cancer cluster families and matched tumour and germline analysis of two myxoid liposarcomas have been validated using a large independent case and control cohort. C16orf96, SLC6A18 and TET2 were found to have a higher mutational burden of nonsynonymous deleterious variants in sarcoma cases compared to healthy ageing controls. A higher mutational burden of putative regulatory variants in cases was found in ARHGAP39, ABL1 and a region encompassing SCL22A20 and POLA2. This study reported five novel associations between C6orf96, SLC6A18, ARHGAP39, 142

174 POLA2 and SLC22A20 and sarcoma. Two of these genes, TET2 and ABL1, are known cancer genes and have potential clinical utility as they have been identified to contribute to drug sensitivity in cancers. This study has identified novel risk genes that appear to have a higher mutational burden in sarcoma cases compared to healthy ageing controls and should be prioritised for further research. 143

175 144

176 Chapter 6 Conclusion 6.1 Summary of results Whole exome sequencing (WES) was performed on three mixed cancer cluster families identified by a sarcoma proband from the International Sarcoma Kindred Study (ISKS). The cancer cluster families selected were not defined by known cancer predisposition syndromes and therefore represented an opportunity to identify novel risk variants associated with both sarcoma and cancer risk. The WES data was annotated, filtered and prioritised using three different strategies to identify rare private variants, known rare variants and candidate gene variants. The prioritised variants were then tested for association with cancer phenotypes using Sequential Oligogenic Linkage Analysis Routines (SOLAR). Nominally significant variants were assessed for familial segregation in each cancer cluster family. Eight novel putative germline risk variants were identified to segregate with cancer in the families. Each variant was private to a single family and showed segregation with mixed cancer types. These findings suggest the presence of inherited cancer mutations that may increase the risk for cancer within families. 145

177 Matched tumour and germline analyses were performed on two myxoid liposarcoma cases from the cancer cluster families. VarScan2 and Strelka were used to identify 13 novel statistically significant somatic mutations. A vast region of loss of heterozygosity and somatic copy number alterations on chromosome 16 encompassing the RBL2 and FUS genes was also identified in one of the tumours, which may contribute towards tumour pathogenesis. Target regions in which germline and somatic mutations were identified in the cancer cluster families were validated using variant burden analyses in 561 sarcoma cases and 1,144 healthy ageing controls. Six target regions showed an increased mutational burden of nonsynonymous deleterious variants (C16orf96, SLC6A18 and TET2 ) or putative regulatory variants (ARHGAP39, ABL1 and a region encompassing SLC22A20 and POLA2 ) in sarcoma cases compared to controls. 6.2 Clinical utility of findings Two target regions that were found to have a higher mutational burden in sarcoma cases (TET2 and ABL1 ) are known cancer genes and have potential clinical utility in the treatment of sarcomas as they have both been identified to contribute to drug sensitivity in cancers. Also, the SCL22A20 gene offers potential clinical utility as an uptake carrier of sorafenib. TET2 and ABL1 have been reported to be associated with myeloid sarcomas and secondary sarcomas in patients with chronic myeloid leukaemia, respectively. However, there is no evidence of association with other sarcoma subtypes. The remaining genes identified in this study represent novel candidate risk genes for sarcoma. The POLA2 gene has been reported to be involved in cell proliferation by mediating DNA replication, recombination, and repair. 444 The role of the remaining genes of interest (C16orf96, ARHGAP39 and SLC6A18 ) in cancer pathogenesis remain to be elucidated. As previously observed by Ballinger et al. and consistent with the findings of the current study, there is a burden of clinically relevant genetic variation in sarcoma patients and their families. 178 The results from this study will be returned to the ISKS coordinators and submitted to a central database. The database contains molecular and biological information that has been collected over time 146

178 on the ISKS families and specimens. It is critical to catalogue genetic variants as future studies of these candidates may provide a further understanding of the aetiology of sarcoma or new therapies that target these candidates may be developed. 6.3 Review of methodology The current study was the first to perform WES in mixed cancer cluster families identified by a sarcoma proband. This study is an example of a successful two-phase next generation sequencing family study approach; the application of WES to cancer cluster families with rare cancers followed by larger replication in independent population cohorts. The results of the current study show the utility of this approach in small cancer cluster families to identify novel risk genes for a rare disease, such as sarcoma. The current study was limited in the size of the initial study sample (19 people in three families) and assumptions used for variant filtering, prioritisation and segregation analysis, and the availability of tumour DNA. The validation using variant burden analysis was also limited by the inability to account for risk, neutral and protective alleles. The current state of bioinformatic tools, databases and knowledge of cancer biology underpinned the study design and analyses performed. The WES data generated in this study may be re-analysed in the future as new tools are developed and/or the results may become clinically relevant as knowledge in this field progresses. The validation of findings from WES (both germline in families and tumour-germline comparison in myxoid liposarcomas) does not provide conclusive evidence of an involvement of these genes in sarcoma pathogenesis. Rather, the results of this study should be seen as hypothesis-generating for novel candidate risk genes that should be prioritised for future research. 147

179 6.4 Recommendations for future work The current study has identified novel candidate risk genes for sarcoma by performing WES in a small number of cancer cluster families. The role of these genes in sarcoma pathogenesis has not been elucidated in this study and was beyond the scope of this thesis. These genes, however, become candidates that can be further tested for association in other sarcoma and cancer cohorts and for functional validation studies such as molecular assays to determine expression or interactions, or biological assays in animal models. The two-phase NGS family study approach is gaining momentum in genomics literature as researchers return to family-based study designs to identify rare genetic variants. The current study adds to the growing evidence that this approach can be successfully used to identify novel risk genes for a rare complex disease such as sarcoma, and may be extended to identify novel risk genes for other complex diseases. 148

180 Bibliography 1 Gerard I. Evan and Karen H. Vousden. Proliferation, cell cycle and apoptosis in cancer. Nature, 411(6835): , SEER Training Modules. Cancer classification. Technical report, U.S. National Institutes of Health, National Cancer Institute, Fred Bunz. Principles of cancer genetics. Springer, Netherlands, 1st edition, Geoffrey M. Cooper and Robert E. Hausman. The development and causes of cancer. In The Cell: A Molecular Approach, pages Sinauer Associates Sunderland, 2nd edition, Jacques Ferlay, Isabelle Soerjomataram, Rajesh Dikshit, Sultan Eser, Colin Mathers, and Marise Rebelo et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN International Journal of Cancer, 136(5):E359 E386, World Health Organization. Global health observatory: the data repository; URL: World Health Organization. Health in 2015: from MDGs to SDGs. World Health Organization, Geneva, Rijo John and Hana Ross. The global economic cost of cancer. Technical report, American Cancer Society, Bert Vogelstein and Kenneth W. Kinzler. Cancer genes and the pathways they control. Nature Medicine, 10(8): ,

181 10 Douglas Hanahan and Robert A. Weinberg. The hallmarks of cancer. Cell, 100(1):57 70, Keith R. Loeb and Lawrence A. Loeb. Significance of multiple mutations in cancer. Carcinogenesis, 21(3): , Roshan Karki, Deep Pandya, Robert C. Elston, and Cristiano Ferlini. Defining mutation and polymorphism in the era of personal genomics. BMC Medical Genomics, 8(1):1, Michael R. Stratton, Peter J. Campbell, and P. Andrew Futreal. The cancer genome. Nature, 458(7239): , Simon J. Talbot and Dorothy H. Crawford. Viruses and tumours - an update. European Journal of Cancer, 40(13): , Peter A. Jones and Stephen B. Baylin. The fundamental role of epigenetic events in cancer. Nature Review Genetics, 3(6): , Bert Vogelstein, Nickolas Papadopoulos, Victor E. Velculescu, Shibin Zhou, Luis A. Diaz, and Kenneth W. Kinzler. Cancer genome landscapes. Science, 339(6127): , Christopher Greenman, Philip Stephens, Raffaella Smith, Gillian L. Dalgliesh, Christopher Hunter, and Graham Bignell et al. Patterns of somatic mutation in human cancer genomes. Nature, 446(7132): , Daniel G. Miller. On the nature of susceptibility to cancer. The presidential address. Cancer, 46(6): , Anna C. Schinzel and William C. Hahn. Oncogenic transformation and experimental models of human cancer. Frontiers in Bioscience, 13:71 84, Niko Beerenwinkel, Tibor Antal, David Dingli, Arne Traulsen, Kenneth W. Kinzler, and Victor E. Velculescu et al. Genetic progression and the waiting time to cancer. PLOS Computational Biology, 3(11):e225,

182 21 Pawan Upadhyay, Renu Dwivedi, and Amit Dutt. Applications of next-generation sequencing in cancer. Current Science, 107(5):795, International Agency for Research on Cancer. World cancer report Technical report, World Health Organisation, Australian Institute of Health and Welfare & Australasian Association of Cancer Registries. Cancer in Australia: an overview, Technical report, AIHW, Julian Peto. Cancer epidemiology in the last century and the next decade. Nature, 411(6835): , Tracey DiSipio, Carla Rogers, Beth Newman, David Whiteman, Elizabeth Eakin, Lin Fritschi, and Joanne Aitken. The Queensland cancer risk study: behavioural risk factor results. Australian and New Zealand Journal of Public Health, 30(4): , Elizabeth B. Claus, Joellen M. Schildkraut, Douglas W. Thompson, and Neil J. Risch. The genetic attributable risk of breast and ovarian cancer. Cancer, 77(11): , Lauri A. Aaltonen, Reijo Salovaara, Paula Kristo, Federico Canzian, Akseli Hemminki, and Paivi Peltomaki et al. Incidence of hereditary nonpolyposis colorectal cancer and the feasibility of molecular screening for the disease. New England Journal of Medicine, 338(21): , Agnes Chompret, Laurence Brugieres, Muriel Ronsin, Maryvonne Gardes, Francoise Dessarps-Freichey, and Anne Abel et al. P53 germline mutations in childhood cancers and cancer risk for carrier individuals. British Journal of Cancer, 82(12):1932, Carlo La Vecchia, Eva Negri, Antonella Gentile, and Silvia Franceschi. Family history and the risk of stomach and colorectal cancer. 70(1):50 55, Cancer, 30 Gianni Zanghieri, Carmela Di Gregorio, Carla Sacchetti, Rossella Fante, Romano Sassatelli, and Giacomo Cannizzo et al. Familial occurrence 151

183 of gastric cancer in the 2-year experience of a population-based registry. Cancer, 66(9): , Shirley Hodgson. Mechanisms of inherited cancer susceptibility. Journal of Zhejiang University. Science. B, 9(1):1 4, Knut Borch-Johnsen, Jorgen H. Olsen, and Thorkild I.A. Sorensen. Genes and family environment in familial clustering of cancer. Theoretical Medicine, 15(4): , Kari Hemminki, Jan Sundquist, and Justo L. Bermejo. How common is familial cancer? Annals of Oncology, 19(1): , David E. Goldgar, Douglas F. Easton, Lisa A. Cannon-Albright, and Mark H. Skolnick. Systematic population-based assessment of cancer risk in first-degree relatives of cancer probands. Journal of the National Cancer Institute, 86(21): , Frederick P. Li, Joseph F. Fraumeni, John J. Mulvihill, William A. Blattner, Margaret G. Dreyfus, Margaret A. Tucker, and Robert W. Miller. A cancer family syndrome in twenty-four kindreds. Cancer Research, 48(18): , Janice L. Berliner and Angela Musial Fay. Risk assessment and genetic counseling for hereditary breast and ovarian cancer: recommendations of the national society of genetic counselors. Journal of Genetic Counseling, 16(3): , Kari Hemminki, Mahdi Fallah, and Akseli Hemminki. Collection and use of family history in oncology clinics. 32(29): , Journal of Clinical Oncology, 38 Paul Lichtenstein, Niels V. Holm, Pia K. Verkasalo, Anastasia Iliadou, Jaakko Kaprio, and Markku Koskenvuo et al. Environmental and heritable factors in the causation of cancer - analyses of cohorts of twins from Sweden, Denmark, and Finland. New England Journal of Medicine, 343(2):78 85,

184 39 Frederick P. Li and Joseph F. Fraumeni. Prospective study of a family cancer syndrome. The Journal of the American Medical Association, 247(19): , Anthony Antoniou, Paul D.P. Pharoah, Steven Narod, Harvey A. Risch, Jorunn E. Eyfjord, and John L. Hopper et al. Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case series unselected for family history: a combined analysis of 22 studies. The American Journal of Human Genetics, 72(5): , Harvey A. Risch, John R. McLaughlin, David E.C. Cole, Barry Rosen, Linda Bradley, and Elaine Kwan et al. Prevalence and penetrance of germline BRCA1 and BRCA2 mutations in a population series of 649 women with ovarian cancer. The American Journal of Human Genetics, 68(3): , Henry T. Lynch and Albert de la Chapelle. Hereditary colorectal cancer. New England Journal of Medicine, 348(10): , Alfred G. Knudson. Mutation and cancer: statistical study of retinoblastoma. Proceedings of the National Academy of Sciences, 68(4): , Abha Gupta and David Malkin. Sarcomas and cancer predisposition syndromes; URL: Judy E. Garber and Kenneth Offit. Hereditary cancer predisposition syndromes. Journal of Clinical Oncology, 23(2): , Csilla I. Szabo and Mary-Claire King. Inherited breast and ovarian cancer. Human Molecular Genetics, 4(suppl 1): , Mary-Claire King, Joan H. Marks, and Jessica B. Mandell. Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2. Science, 302(5645): ,

185 48 Sining Chen, Edwin S. Iversen, Tara Friebel, Dianne Finkelstein, Barbara L. Weber, and Andrea Eisen et al. Characterization of BRCA1 and BRCA2 mutations in a large United States sample. Journal of Clinical Oncology, 24(6): , Eric R. Fearon. Human cancer syndromes: clues to the origin and nature of cancer. Science, 278(5340):1043, Ichiro Satokata, Kiyoji Tanaka, Naoyuki Miura, Michiko Narita, Takashi Mimaki, and Yoshiaki Satoh et al. Three nonsense mutations responsible for group A xeroderma pigmentosum. Mutation Research/DNA Repair, 273(2): , David Malkin, Frederick P. Li, Louise C. Strong, Joseph F. Fraumeni, Camille E. Nelson, and David H. Kim et al. Germline p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms. Science, 250(4985): , Frederick P. Li and Joseph F. Jr Fraumeni. Soft-tissue sarcomas, breast cancer, and other neoplasms: a familial syndrome? Medicine, 71(4): , Annals of Internal 53 David Malkin, Kent W. Jolly, Noele Barbier, A. Thomas Look, Stephen H. Friend, and Mark C. Gebhardt et al. Germline mutations of the p53 tumor-suppressor gene in children and young adults with second malignant neoplasms. New England Journal of Medicine, 326(20): , Arnold J. Levine. P53, the cellular gatekeeper for growth and division. Cell, 88(3): , Amato J. Giaccia and Michael B. Kastan. The complexity of p53 modulation: emerging patterns from divergent signals. Development, 12(19): , Genes & 56 Charles J. Sherr and Frank McCormick. The RB and p53 pathways in cancer. Cancer Cell, 2(2): , Fattaneh A. Tavassoli, Peter Devilee, and World Health Organization. Tumours of the breast and female genital organs - pathology and genetics. 154

186 World Health Organization Classification of Tumours. Lyon, France: IARC Press, Laufey T. Amundadottir, Sverrir Thorvaldsson, Daniel F. Gudbjartsson, Patrick Sulem, Kristleifur Kristjansson, and Sigurdur Arnason et al. Cancer as a complex phenotype: pattern of cancer distribution within and beyond the nuclear family. PLOS Medicine, 1(3):e65, Iona Cheng, Jonathan M. Kocarnik, Logan Dumitrescu, Noralane M. Lindor, Jenny Chang-Claude, and Christy L. Avery et al. Pleiotropic effects of genetic risk variants for other cancers on colorectal cancer risk: PAGE, GECCO and CCFR consortia. Gut, 63(5): , Lisa A. Cannon-Albright, Alun Thomas, David E. Goldgar, Khosrow Gholami, Kerry Rowe, and Matt Jacobsen et al. Utah. Cancer Research, 54(9): , Familiality of cancer in 61 Pauli Vaittinen and Kari Hemminki. Familial cancer risks in offspring from discordant parental cancers. International Journal of Cancer, 81(1):12 19, Chuanhui Dong and Kari Hemminki. Modification of cancer risks in offspring by sibling and parental cancers from 2,112,616 nuclear families. International Journal of Cancer, 92(1): , Kamila Czene, Paul Lichtenstein, and Kari Hemminki. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish family-cancer database. International Journal of Cancer, 99(2): , Christopher D.M. Fletcher and World Health Organization. WHO classification of tumours of soft tissue and bone. International Agency for Research on Cancer, Zachary Burningham, Mia Hashibe, Logan Spector, and Joshua Schiffman. The epidemiology of sarcoma. Clinical Sarcoma Research, 2(1):14, Guy Lahat, Alexander Lazar, and Dina Lev. Sarcoma epidemiology and etiology: potential environmental and genetic factors. Surgical Clinics of North America, 88(3): ,

187 67 John R. Goldblum, Sharon W. Weiss, and Andrew L. Folpe. Enzinger and Weiss s soft tissue tumors. Elsevier Health Sciences, Fritz Schajowicz. Histological typing of bone tumours. Springer Science & Business Media, W. Archie Bleyer. Cancer in older adolescents and young adults: epidemiology, diagnosis, treatment, survival, and importance of clinical trials. Medical and Pediatric Oncology, 38(1):1 10, W. Archie Bleyer, Troy Budd, and Michael Montello. Adolescents and young adults with cancer. Cancer, 107(S7): , Ernest K. Amankwah, Anthony P. Conley, and Damon R. Reed. Epidemiology and therapies for metastatic sarcoma. Clinical Epidemiology, 5: , Australasian Association of Cancer Registries. Cancer in Australia 1998: incidence and mortality data for Technical report, Australian Institute of Health and Welfare, Kasmintan A. Schrader, Donavan T. Cheng, Vijai Joseph, Meera Prasad, Michael Walsh, and Ahmet Zehir et al. Germline variants in targeted tumor sequencing using matched normal DNA. JAMA Oncology, 2(1): , Jinghui Zhang, Michael F. Walsh, Gang Wu, Michael N. Edmonson, Tanja A. Gruber, and John Easton et al. Germline mutations in predisposition genes in pediatric cancer. New England Journal of Medicine, 373(24): , Fabio Levi, Lalao Randimbison, Manuela Maspoli-Conconi, Rafael Blanc-Moya, and Carlo La Vecchia. Incidence of second sarcomas: a cancer registry-based study. Cancer Causes & Control, 25(4): , Josefin Fernebro, Anna Bladstrom, Anders Rydholm, Pelle Gustafson, Hakan Olsson, Jacob Engellau, and Mef Nilbert. Increased risk of malignancies in a population-based study of 818 soft-tissue sarcoma patients. British Journal of Cancer, 95(8): ,

188 77 Ruth A. Kleinerman, Sara J. Schonfeld, and Margaret A. Tucker. Sarcomas in hereditary retinoblastoma. Clinical Sarcoma Research, 2, Michael A. Postow and Mark E. Robson. Inherited gastrointestinal stromal tumor syndromes: mutations, clinical features, and therapeutic implications. Clinical Sarcoma Research, 2, D. Gareth R. Evans, Susan M. Huson, and Jillian M. Birch. Malignant peripheral nerve sheath tumours in inherited disease. Research, 2, Clinical Sarcoma 80 Junya Toguchida, Toshikazu Yamaguchi, Siri H. Dayton, Roberta L. Beaughamp, Guillermo E. Herrera, and Kanji Ishizaki at al. Prevalence and spectrum of germline mutations of the p53 gene among patients with sarcoma. New England Journal of Medicine, 326(20): , Shih-Jen Hwang, Guillermina Lozano, Christopher I. Amos, and Louise C. Strong. Germline p53 mutations in a cohort with childhood sarcoma: sex differences in cancer risk. The American Journal of Human Genetics, 72(4): , Amy Berrington de Gonzalez, Alina Kutsenko, and Preetha Rajaraman. Sarcoma risk after radiation exposure. Clinical Sarcoma Research, 2(1):1, Lee J. Helman and Paul Meltzer. Mechanisms of sarcoma development. Nature Reviews Cancer, 3(9): , Kishor Bhatia, Meredith S. Shiels, Alexandra Berg, and Eric A. Engels. Sarcomas other than Kaposi sarcoma occurring in immunodeficiency: interpretations from a systematic literature review. Current Opinion in Oncology, 24(5):537, Denise Whitby, Chris Boshoff, T. Hatzioannou, Robert A. Weiss, Thomas F. Schulz, and Mark R. Howard et al. Detection of Kaposi sarcoma associated herpesvirus in peripheral blood of HIV-infected individuals and progression to Kaposi s sarcoma. The Lancet, 346(8978): ,

189 86 R. Balarajan and Ernest D. Acheson. Soft tissue sarcomas in agriculture and forestry workers. 38(2): , Journal of Epidemiology and Community Health, 87 Diego Serraino, Silvia Franceschi, Carlo La Vecchia, and Antonino Carbone. Occupation and soft-tissue sarcoma in northeastern Italy. Cancer Causes & Control, 3(1):25 30, Gun Wingren, Mats Fredrikson, H. Noorlind Brage, Bo Nordenskjold, and Olav Axelson. Soft tissue sarcoma and occupational exposures. Cancer, 66(4): , Franco Merletti, Lorenzo Richiardi, Franco Bertoni, Wolfgang Ahrens, Antoine Buemi, and Cristina Costa-Santos et al. Occupational factors and risk of adult bone sarcomas: A multicentric case-control study in Europe. International Journal of Cancer, 118(3): , Eero Pukkala, Jan Ivar Martinsen, Elsebeth Lynge, Holmfridur Kolbrun Gunnarsdottir, Par Sparen, and Laufey Tryggvadottir et al. Occupation and cancer-follow-up of 15 million people in five Nordic countries. Acta Oncologica, 48(5): , Mikael Eriksson, Lennart Hardell, and Hans-Olov Adami. Exposure to dioxins as a risk factor for soft tissue sarcoma: A population-based case-control study. Journal of the National Cancer Institute, 82(6): , Jane A. Hoppin, Paige E. Tolbert, W. Dana Flanders, Rebecca H. Zhang, Danni S. Daniels, Bruce D. Ragsdale, and Edward A. Brann. Occupational risk factors for sarcoma subtypes. Epidemiology, 10(3): , Manolis Kogevinas, Timo Kauppinen, Regina Winkelmann, Heiko Becher, Pier Alberto Bertazzi, and H. Bas Bueno-de-Mesquita et al. Soft tissue sarcoma and non-hodgkin s lymphoma in workers exposed to phenoxy herbicides, chlorophenols, and dioxins: two nested case-control studies. Epidemiology, 6(4): , Lennart Hardell and Mikael Eriksson. The association between soft tissue sarcomas and exposure to phenoxyacetic acids. Cancer, 62(3): ,

190 95 J. Gustav Smith and Allen J. Christophers. Phenoxy herbicides and chlorophenols: a case control study on soft tissue sarcoma and malignant lymphoma. British Journal of Cancer, 65(3):442, James S. Woods, Lincoln Polissar, Richard K. Severson, LS. Heuser, and Bruce G. Kulander. Soft tissue sarcoma and non-hodgkin s lymphoma in relation to phenoxyherbicide and chlorinated phenol exposure in western Washington. Journal of the National Cancer Institute, 78(5): , Francesca Fioretti, Alessandra Tavani, Silvano Gallus, Eva Negri, Silvia Franceschi, and Carlo La Vecchia. Menstrual and reproductive factors and risk of soft tissue sarcomas. Cancer, 88(4): , Kristin P. Anfinsen, Susan S. Devesa, Freddie Bray, Rebecca Troisi, Thora J. Jonasdottir, Oyvind S. Bruland, and Tom Grotmol. Age-period-cohort analysis of primary bone cancer incidence rates in the United States ( ). Cancer Epidemiology Biomarkers & Prevention, 20(8): , Deborah M. Winn, Frederick P. Li, Leslie L. Robison, John J. Mulvihill, Ann E. Daigle, and Joseph F. Fraumeni. A case-control study of the etiology of Ewing s sarcoma. Cancer Epidemiology Biomarkers & Prevention, 1(7): , Seymour Grufferman, Helen H. Wang, Elizabeth R. DeLong, Sue Y.S. Kimm, Elizabeth S. Delzell, and John M. Falletta. Environmental factors in the etiology of rhabdomyosarcoma in childhood. Journal of the National Cancer Institute, 68(1): , Ann L. Hartley, Jillian M. Birch, Henry B. Marsden, Martin Harris, and Val Blair. Neurofibromatosis in children with soft tissue sarcoma. Pediatric Hematology and Oncology, 5(1):7 16, Lisa Mirabello, Ruth Pfeiffer, Gwen Murphy, Najat C. Daw, Ana Patino-Garcia, and Rebecca J. Troisi et al. Height at diagnosis and birth-weight as risk factors for osteosarcoma. Cancer Causes & Control, 22(6): ,

191 103 Logan G. Spector, Susan E. Puumala, Susan E. Carozza, Eric J. Chow, Erin E. Fox, and Scott Horel et al. Cancer risk among children with very low birth weights. Pediatrics, 124(1):96 104, Simona Ognjanovic, Susan E. Carozza, Eric J. Chow, Erin E. Fox, Scott Horel, and Colleen C. McLaughlin et al. Birth characteristics and the risk of childhood rhabdomyosarcoma based on histological subtype. British Journal of Cancer, 102(1): , Julie Von Behren, Logan G. Spector, Beth A. Mueller, Susan E. Carozza, Eric J. Chow, and Erin E. Fox et al. Birth order and risk of childhood cancer: a pooled analysis from five US States. International Journal of Cancer, 128(11): , Felix Mitelman, Bertil Johansson, and Fredrik Mertens. Mitelman database of chromosome aberrations and gene fusions in cancer; URL: Shujuan J. Xia and Frederic G. Barr. Chromosome translocations in sarcomas and the emergence of oncogenic transcription factors. European Journal of Cancer, 41(16): , Surbhi Jain, Lori W. McGinnes, and Trudy G. Morrison. Thiol/disulfide exchange is required for membrane fusion directed by the Newcastle disease virus fusion protein. Journal of Virology, 81(5): , Brian P. Rubin, Samuel Singer, Connie Tsao, Anette Duensing, Marcia L. Lux, and Robert Ruiz et al. KIT activation is a ubiquitous feature of gastrointestinal stromal tumors. Cancer Research, 61(22): , Michael C. Heinrich, Christopher L. Corless, Anette Duensing, Laura McGreevey, Chang-Jie Chen, and Nora Joseph et al. PDGFRA activating mutations in gastrointestinal stromal tumors. Science, 299(5607): , Louis Guillou and Alain Aurias. Soft tissue sarcomas with complex genomic profiles. Virchows Archiv, 456(2): ,

192 112 Jeff M. Hall, Ming K. Lee, Beth Newman, Jan E. Morrow, Lee A. Anderson, Bing Huey, and Marie-Claire King. Linkage of early-onset familial breast cancer to chromosome 17q21. Science, 250(4988):1684, Richard Wooster, Susan L. Neuhausen, Jonathan Mangion, Yvette Quirk, Deborah Ford, and Nadine Collins et al. Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q Science, 265(5181): , Walter F. Bodmer, Carolyn J. Bailey, Julia G. Bodmer, H.J.R. Bussey, Anthony Ellis, and Patricia Gorman et al. Localization of the gene for familial adenomatous polyposis on chromosome 5. Nature, 328(6131): , Paivi Peltomaki, Lauri A. Aaltonen, Pertti Sistonen, Lea Pylkkanen, Jukka-Pekka Mecklin, and Heikki Jarvinen et al. Genetic mapping of a locus predisposing to human colorectal cancer. Science, 260(5109): , Annika Lindblom, Pia Tannergard, Barbro Werelius, and Magnus Nordenskjold. Genetic mapping of a second locus predisposing to hereditary non-polyposis colon cancer. Nature Genetics, 5(3): , Lisa A. Cannon-Albright, David E. Goldgar, Laurence J. Meyer, Cathryn M. Lewis, David E. Anderson, and J.W. Fountain et al. Assignment of a locus for familial melanoma, MLM, to chromosome 9p13-p22. Science, 258(5085):1148, Group Anglian Breast Cancer Study. Prevalence and penetrance of BRCA1 and BRCA2 mutations in a population-based series of breast cancer cases. British Journal of Cancer, 83(10):1301, Kirsi Syrjakoski, Pia Vahteristo, Hannaleena Eerola, Anitta Tamminen, Kati Kivinummi, and Laura Sarantaus et al. Population-based study of BRCA1 and BRCA2 mutations in 1035 unselected Finnish breast cancer patients. Journal of the National Cancer Institute, 92(18): , Gudrun Johannesdottir, Julius Gudmundsson, Jon T. Bergthorsson, Adalgeir Arason, Bjarni A. Agnarsson, and Gudny Eiriksdottir et al. High prevalence 161

193 of the 999del5 mutation in Icelandic breast and ovarian cancer patients. Cancer Research, 56(16): , Steinunn Thorlacius, Stefan Sigurdsson, Helga Bjarnadottir, Gudridur Olafsdottir, Jon Gunnlaugur Jonasson, and Laufey Tryggvadottir et al. Study of a single BRCA2 mutation with high carrier frequency in a small population. American Journal of Human Genetics, 60(5):1079, Patricia Hartge, Jeffery P. Struewing, Sholom Wacholder, Lawrence C. Brody, and Margaret A. Tucker. The prevalence of common BRCA1 and BRCA2 mutations among Ashkenazi Jews. The American Journal of Human Genetics, 64(4): , Steinunn Thorlacius, Jeffery P. Struewing, Patricia Hartage, Gudridur H. Olafsdottir, Helgi Sigvaldason, and Laufey Tryggvadottir et al. Population-based study of risk of breast cancer in carriers of BRCA2 mutation. The Lancet, 352(9137): , Bruce A.J. Ponder. Cancer genetics. Nature, 411(6835): , Joel N. Hirschhorn and Mark J. Daly. Genome-wide association studies for common diseases and complex traits. Nature Review Genetics, 6(2):95 108, Tony Burdett, Peggy N. Hall, Emma Hastings, Lucia A. Hindorff, and Heather A. Junkins. The NHGRI-EBI Catalog of published genome-wide association studies. Available at: Andrew D. Beggs and Shirley V. Hodgson. Genomics and breast cancer: the different levels of inherited susceptibility. European Journal of Human Genetics, 17(7): , Teri A. Manolio, Francis S. Collins, Nancy J. Cox, David B. Goldstein, Lucia A. Hindorff, and David J. Hunter et al. Finding the missing heritability of complex diseases. Nature, 461(7265): , Jon McClellan and Mary-Claire King. Genetic heterogeneity in human disease. Cell, 141(2): ,

194 130 Frederick Sanger, Steven Nicklen, and Alan R. Coulson. DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, 74(12): , Marcel Margulies, Michael Egholm, William E. Altman, Said Attiya, Joel S. Bader, and Lisa A. Bemben et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057): , Erwin L. van Dijk, Helene Auger, Yan Jaszczyszyn, and Claude Thermes. Ten years of next-generation sequencing technology. 30(9): , Trends in Genetics, 133 Daniel C. Koboldt, Karyn Meltz Steinberg, David E. Larson, Richard K. Wilson, and Elaine R. Mardis. The next-generation sequencing revolution and its impact on genomics. Cell, 155(1):27 38, Sally Bamford, Emily Dawson, Simon Forbes, Jody Clements, Roger Pettett, and Ahmet Dogan et al. The COSMIC (catalogue of somatic mutations in cancer) database and website. British Journal of Cancer, 91(2): , The Cancer Genome Atlas Research Network, John N. Weinstein, Eric A. Collisson, Gordon B. Mills, Kenna R. Mills Shaw, Brad A. Ozenberger, and Kyle Ellrott et al. The cancer genome atlas pan-cancer analysis project. Nature Genetics, 45(10): , Thomas J. Hudson, Warwick Anderson, Axel Aretz, Anna D. Barker, Cindy Bell, and Rosa R. Bernabe et al. International network of cancer genome projects. Nature, 464(7291): , Veronique G. LeBlanc and Marco A. Marra. Next-generation sequencing approaches in cancer: Where have they brought us and where will they take us? Cancers, 7(3): , Elaine R. Mardis. Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9(1): , Michael L. Metzker. Sequencing technologies - the next generation. Nature Reviews Genetics, 11(1):31 46,

195 140 David N. Cooper. The nature and mechanisms of human gene mutation. The Metabolic and Molecular Bases of Inherited Disease, pages , Sarah B. Ng, Emily H. Turner, Peggy D. Robertson, Steven D. Flygare, Abigail W. Bigham, and Choli Lee et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261): , Sarah B. Ng, Abigail W. Bigham, Kati J. Buckingham, Mark C. Hannibal, Margaret J. McMillin, and Heidi I. Gildersleeve et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genetics, 42(9): , Alexander Hoischen, Bregje W.M. van Bon, Christian Gilissen, Peer Arts, Bart van Lier, and Marloes Steehouwer et al. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nature Genetics, 42(6): , Sarah B. Ng, Kati J. Buckingham, Choli Lee, Abigail W. Bigham, Holly K. Tabor, and Karin M. Dent et al. Exome sequencing identifies the cause of a Mendelian disorder. Nature Genetics, 42(1):30 35, Jun Ling Wang, Xu Yang, Kun Xia, Zheng Mao Hu, Ling Weng, and Xin Jin et al. TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing. Brain, 133(12): , Chee-Seng Ku, Nasheen Naidoo, and Yudi Pawitan. Revisiting Mendelian disorders through exome sequencing. Human Genetics, 129(4): , Jessada Thutkawkorapin, Simone Picelli, Vinaykumar Kontham, Tao Liu, Daniel Nilsson, and Annika Lindblom. Exome sequencing in one family with gastric- and rectal cancer. BMC Genetics, 17:41, Matthew Meyerson, Stacey Gabriel, and Gad Getz. Advances in understanding cancer genomes through second-generation sequencing. Nature Review Genetics, 11(10): , Riyue Bao, Lei Huang, Jorge Andrade, Wei Tan, Warren A. Kibbe, Hongmei Jiang, and Gang Feng. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Informatics, 13(Suppl 2):67 82,

196 150 Kristian Cibulskis, Michael S. Lawrence, Scott L. Carter, Andrey Sivachenko, David Jaffe, and Carrie Sougnez et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnology, 31(3): , Qingguo Wang, Peilin Jia, Fei Li, Haiquan Chen, Hongbin Ji, and Donald Hucks et al. Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers. Genome Medicine, 5(10):1 8, Xiaofeng Zhu, Tao Feng, Yali Li, Qing Lu, and Robert C. Elston. Detecting rare variants for complex traits using family and unrelated data. Genetic Epidemiology, 34(2): , Tao Feng, Robert C. Elston, and Xiaofeng Zhu. Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS). Genetic Epidemiology, 35(5): , Iuliana Ionita-Laza and Ruth Ottman. Study designs for identification of rare disease variants in complex diseases: the utility of family-based designs. Genetics, 189(3): , Gang Shi and D.C. Rao. Optimum designs for next-generation sequencing to discover rare variants for common complex disease. Genetic Epidemiology, 35(6): , Colin C. Pritchard, Christina Smith, Stephen J. Salipante, Ming K. Lee, Anne M. Thornton, and Alex S. Nord et al. ColoSeq provides comprehensive Lynch and polyposis syndrome mutational analysis using massively parallel sequencing. The Journal of Molecular Diagnostics, 14(4): , Tom Walsh, Ming K. Lee, Silvia Casadei, Anne M. Thornton, Sunday M. Stray, and Christopher Pennil et al. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing. Proceedings of the National Academy of Sciences, 107(28): , Duncan Thomas, Zhao Yang, and Fan Yang. Two-phase and family-based designs for next-generation sequencing studies. Frontiers in Genetics, 4(276),

197 159 Nazneen Rahman. Realizing the promise of cancer predisposition genes. Nature, 505(7483): , Nazneen Rahman. Mainstreaming genetic testing of cancer predisposition genes. Clinical Medicine, 14(4): , Victor A. McKusick. Mendelian Inheritance in Man and Its Online Version, OMIM. American Journal of Human Genetics, 80(4): , Olivia Fletcher and Richard S. Houlston. Architecture of inherited susceptibility to common cancer. Nature Reviews Cancer, 10(5): , David M. Thomas and Mandy L. Ballinger. Inherited and de novo germline TP53 mutations in adult-onset sarcoma. Practice, 10(2):A26, Hereditary Cancer in Clinical 164 Levi A. Garraway and Eric S. Lander. Lessons from the cancer genome. Cell, 153(1):17 37, Himisha Beltran, Davide Prandi, Juan Miguel Mosquera, Matteo Benelli, Loredana Puca, and Joanna Cyrta et al. Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nature Medicine, 22(3): , Peter D. Stenson, Edward V. Ball, Katy Howells, Andrew D. Phillips, Matthew Mort, and David N. Cooper. The human gene mutation database: providing a comprehensive central mutation database for molecular diagnostics and personalised genomics. Human Genomics, 4(2):69, Murim Choi, Ute I. Scholl, Weizhen Ji, Tiewen Liu, Irina R. Tikhonova, and Paul Zumbo et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45): , Dale Hedges, Dan Burges, Eric Powell, Cherylyn Almonte, Jia Huang, and Stuart Young et al. Exome sequencing of a multigenerational human pedigree. PLOS ONE, 4(12):e8232,

198 169 David Botstein and Neil Risch. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genetics, 33(3s):228, Urs A. Meyer. Pharmacogenetics and adverse drug reactions. The Lancet, 356(9242): , Urs A. Meyer, Ulrich M. Zanger, and Matthias Schwab. Omics and drug response. Annual Review of Pharmacology and Toxicology, 53(1): , Barry Merriman, Ion Torrent Development Team, and Jonathan M. Rothberg. Progress in Ion Torrent semiconductor chip based sequencing. Electrophoresis, 33(23): , Martin Mascher, Shuangye Wu, Paul St Amand, Nils Stein, and Jesse Poland. Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley. PLOS ONE, 8(10):e76925, Nicholas J. Loman, Raju V. Misra, Timothy J. Dallman, Chrystala Constantinidou, Saheer E. Gharbia, John Wain, and Mark J. Pallen. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology, 30(5): , Australasian Sarcoma Study Group. International sarcoma kindred study, URL: Gillian Mitchell, Mandy L. Ballinger, Stephen Wong, Chelsee Hewitt, Paul James, and Mary-Anne Young et al. High frequency of germline TP53 mutations in a prospective adult-onset sarcoma cohort. PLOS ONE, 8(7):1 7, Gang Peng, Jasmina Bojadzieva, Mandy L. Ballinger, Jialu Li, Amanda L. Blackford, and Phuong L. Mai et al. Estimating TP53 mutation carrier probability in families with Li-Fraumeni syndrome using LFSPRO. Cancer Epidemiology and Prevention Biomarkers, pages cebp ,

199 178 Mandy L. Ballinger, David L. Goode, Isabelle Ray-Coquard, Paul A. James, Gillian Mitchell, and Eveline Niedermayr et al. Monogenic and polygenic determinants of sarcoma risk: an international genetic study. The Lancet Oncology, 17(9): , Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, and Mark A. DePristo et al. The variant call format and vcftools. Bioinformatics, 27(15): , Aaron McKenna, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, and Andrew Kernytsky et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9): , Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, and Nils Homer et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16): , GATK Documentation. Variant quality score recalibration (VQSR), URL: James T. Robinson, Helga Thorvaldsdottir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, and Jill P. Mesirov. genomics viewer. Nature Biotechnology, 29(1):24 26, Integrative 184 Helga Thorvaldsdottir, James T. Robinson, and Jill P. Mesirov. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics, 14(2): , Michael J. Clark, Rui Chen, Hugo Y. K. Lam, Konrad J. Karczewski, Rong Chen, and Ghia Euskirchen et al. Performance comparison of exome DNA sequencing technologies. Nature Biotechnology, 29(10): , Alison M. Meynert, Louise S. Bicknell, Matthew E. Hurles, Andrew P. Jackson, and Martin S. Taylor. Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics, 14(1):1,

200 187 Robert P. VanderWaal, Douglas R. Spitz, Cara L. Griffith, Ryuji Higashikubo, and Joseph L. Roti Roti. Evidence that protein disulfide isomerase (PDI) is involved in DNA-nuclear matrix anchoring. Journal of Cellular Biochemistry, 85(4): , Henry B. Mann and Donald R. Whitney. On a test of whether one of two random variables is stochastically larger than the other. Mathematical Statistics, pages 50 60, The Annals of 189 William Bateson and Gregor Mendel. Mendel s principles of heredity. University press, Ingrid B. Borecki and Michael A. Province. Genetic and genomic discovery using family studies. Circulation, 118(10): , Diana Merino and David Malkin. p53 and hereditary cancer. In Deb Swati Palit and Deb Sumitra, editors, Mutant p53 and MDM2 in Cancer, pages Springer Netherlands, Dordrecht, Joanne Ngeow and Charis Eng. Precision medicine in heritable cancer: when somatic tumour testing and germline mutations meet. NPJ Genomic Medicine, 1:15006, Edward D. Lustbader, Wick R. Williams, Melissa L. Bondy, Sara Strom, and Louise C. Strong. Segregation analysis of cancer in families of childhood soft-tissue-sarcoma patients. American Journal of Human Genetics, 51(2): , Biljana Novakovic, Alisa M. Goldstein, Leonard H. Wexler, and Margaret A. Tucker. Increased risk of neuroectodermal tumors and stomach cancer in relatives of patients with Ewing s sarcoma family of tumors. Journal of the National Cancer Institute, 86(22): , Ann L. Hartley, Jillian M. Birch, Val Blair, Anna M. Kelsey, Martin Harris, and Patricia H. Morris Jones. Patterns of cancer in the families of children with soft tissue sarcoma. Cancer, 72(3): , Eileen Burke, Frederick P. Li, Abbe J. Janov, Stephen Batter, Holcombe Grier, and Allen Goorin. Cancer in relatives of survivors of childhood sarcoma. Cancer, 67(5): ,

201 197 Kevin B. Jones, Joshua D. Schiffman, Wendy Kohlmann, R. Lor Randall, Stephen L. Lessnick, and Lisa A. Cannon-Albright. Complex genotype sarcomas display familial inheritance independent of known cancer predisposition syndromes. Cancer Epidemiology Biomarkers & Prevention, 20(5): , Henry T. Lynch, Gabriel M. Mulcahy, Randall E. Harris, Hoda A. Guirgis, and Jane F. Lynch. Genetic and pathologic findings in a kindred with hereditary sarcoma breast cancer, brain tumors, leukemia, lung, laryngeal, and adrenal cortical carcinoma. Cancer, 41: , Wick R. Williams and Louise C. Strong. Genetic epidemiology of soft tissue sarcomas in children. In Familial Cancer, pages Karger Publishers, Henry T. Lynch, Randall E. Brand, David Hogg, Carolyn A. Deters, Ramon M. Fusaro, and Jane F. Lynch et al. Phenotypic variation in eight extended CDKN2A germline mutation familial atypical multiple mole melanoma-pancreatic carcinoma-prone families. Cancer, 94(1):84 96, Stephen J. Rulyak, Teresa A. Brentnall, Henry T. Lynch, and Melissa A. Austin. Characterization of the neoplastic phenotype in the familial atypical multiple mole melanoma pancreatic carcinoma syndrome. Cancer, 98(4): , Sophie Sun, Pamela M. Pollock, Ling Liu, Sepideh Karimi, Serge Jothy, and Benedict J. Milner et al. CDKN2A mutation in a non-fammm kindred with cancers at multiple sites results in a functionally abnormal protein. International Journal of Cancer, 73(4): , Rodney C.P. Go, Mary-Claire King, Joan Bailey-Wilson, Robert C. Elston, and Henry T. Lynch. Genetic epidemiology of breast cancer and associated cancers in high-risk families. I. Segregation analysis. Journal of the National Cancer Institute, 71(3): , Henry T. Lynch, Carolyn A. Deters, David Hogg, Jane F. Lynch, Yulia Kinarsky, and Zoran Gatalica. Familial sarcoma. Cancer, 98(9): ,

202 205 Audrey H. Schnell and John S. Witte. Family-based study designs. In Timothy R. Rebeck, Christine B. Ambrosone, and Peter G. Shields, editors, Molecular Epidemiology: Applications in Cancer and Other Human Diseases, pages Taylor & Francis, Steven A. Narod, Deborah Ford, Peter Devilee, Rosa B. Barkardottir, Henry T. Lynch, and Simon A. Smith et al. An evaluation of genetic heterogeneity in 145 breast-ovarian cancer families. American Journal of Human Genetics, 56(1): , Jared C. Roach, Gustavo Glusman, Arian F.A. Smit, Chad D. Huff, Robert Hubley, and Paul T. Shannon et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science, 328(5978): , Jianxin Shi, Xiaohong R. Yang, Bari Ballew, Melissa Rotunno, Donato Calista, and Maria Concetta Fargnoli et al. Rare missense variants in POT1 predispose to familial cutaneous malignant melanoma. Nature Genetics, 46(5): , Leslie G. Biesecker. Exome sequencing makes medical genomics a reality. Nature Genetics, 42(1):13 15, Michael J. Bamshad, Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor, Mary J. Emond, Deborah A. Nickerson, and Jay Shendure. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics, 12(11): , Gregory V. Kryukov, Len A. Pennacchio, and Shamil R. Sunyaev. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics, 80(4): , Colin C. Pritchard, Stephen J. Salipante, Karen Koehler, Christina Smith, Sheena Scroggins, and Brent Wood et al. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. The Journal of Molecular Diagnostics, 16(1):56 67,

203 213 Antonija Kreso, Catherine A. O Brien, Peter van Galen, Olga I. Gan, Faiyaz Notta, and Andrew M.K. Brown et al. Variable clonal repopulation dynamics influence chemotherapy response in colorectal cancer. Science, 339(6119): , Sreenath V. Sharma, Daphne W. Bell, Jeffrey Settleman, and Daniel A. Haber. Epidermal growth factor receptor mutations in lung cancer. Nature Reviews Cancer, 7(3): , Paul B. Chapman, Axel Hauschild, Caroline Robert, John B. Haanen, Paolo Ascierto, and James Larkin et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. New England Journal of Medicine, 2011(364): , David M. Thomas and Mandy L. Ballinger. Diagnosis and management of hereditary sarcoma. In Rare Hereditary Cancers, pages Springer, Navnath S. Gavande, Pamela S. VanderVere-Carozza, Hilary D. Hinshaw, Shadia I. Jalal, Catherine R. Sears, Katherine S. Pawelczak, and John J. Turchi. DNA repair targeted therapy: The past or future of cancer treatment? Pharmacology & Therapeutics, 160:65 83, David C. Samuels, Leng Han, Jiang Li, Sheng Quanghu, Travis A. Clark, Yu Shyr, and Yan Guo. Finding the lost treasures in exome sequencing data. Trends in Genetics, 29(10): , Malte Spielmann and Stefan Mundlos. Looking beyond the genes: the role of non-coding variants in human disease. Human Molecular Genetics, 25(R2):R157 R165, Graham R.S. Ritchie and Paul Flicek. Computational approaches to interpreting genomic sequence variation. Genome Medicine, 6(10):87, Paul D.P. Pharoah, Alison M. Dunning, Bruce A.J. Ponder, and Douglas F. Easton. Association studies for finding cancer-susceptibility genetic variants. Nature Reviews Cancer, 4(11): ,

204 222 Susanne Horn, Adina Figl, P. Sivaramakrishna Rachakonda, Christine Fischer, Antje Sucker, Andreas Gast, Stephanie Kadel, Iris Moll, Eduardo Nagore, and Kari Hemminki. Tert promoter mutations in familial and sporadic melanoma. Science, 339(6122): , The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57 74, Amanda Warr, Christelle Robert, David Hume, Alan Archibald, Nader Deeb, and Mick Watson. Exome sequencing: current and future perspectives. G3: Genes Genomes Genetics, 5(8): , Ken Chen, John W. Wallis, Michael D. McLellan, David E. Larson, Joelle M. Kalicki, Craig S. Pohl, and et al. Breakdancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods, 6(9): , Can Alkan, Bradley P. Coe, and Evan E. Eichler. Genome structural variation discovery and genotyping. Nature Reviews Genetics, 12(5): , Biao Liu, Jeffrey M. Conroy, Carl D. Morrison, Adekunle O. Odunsi, Maochun Qin, Lei Wei, and et al. Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives. Oncotarget, 6(8): , Shengpei Chen, Sheng Li, Weiwei Xie, Xuchao Li, Chunlei Zhang, and Haojun Jiang et al. Performance comparison between rapid sequencing platforms for ultra-low coverage sequencing strategy. PLOS ONE, 9(3):e92192, Joseph F. Boland, Charles C. Chung, David Roberson, Jason Mitchell, Xijun Zhang, and Kate M. Im et al. The new sequencer on the block: comparison of Life Technology s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Human Genetics, 132(10): , Eric Samorodnitsky, Benjamin M. Jewell, Raffi Hagopian, Jharna Miya, Michele R. Wing, and Ezra Lyon et al. Evaluation of hybridization capture 173

205 versus amplicon-based methods for whole-exome sequencing. Mutation, 36(9): , Human 231 Pankaj Kumar, Mashael Al-Shafai, Wadha Ahmed Al Muftah, Nader Chalhoub, Mahmoud F. Elsaid, Alice Abdel Aleem, and Karsten Suhre. Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance. BMC Research Notes, 7:747, Pengyuan Zhu, Lingyu He, Yaqiao Li, Wenpan Huang, Feng Xi, and Lin Lin et al. OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from Ion Torrent data. PLOS ONE, 9(5):e97507, Xiangtao Liu, Shizhong Han, Zuoheng Wang, Joel Gelernter, and Bao-Zhu Yang. Variant callers for next-generation sequencing data: a comparison study. PLOS ONE, 8(9):e75619, Su Yeon Kim, Laurent Jacob, and Terence P. Speed. Combining calls from multiple somatic mutation-callers. BMC Bioinformatics, 15(1):154, Ikuko N. Motoike, Mitsuyo Matsumoto, Inaho Danjoh, Fumiki Katsuoka, Kaname Kojima, and Naoki Nariai et al. Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population. BMC Genomics, 15(1):673, Daniel G. MacArthur, Teri. A. Manolio, David P. Dimmock, Heidi L. Rehm, Jay Shendure, and Goncalo R. Abecasis et al. Guidelines for investigating causality of sequence variants in human disease. Nature, 508(7497): , LaDeana W. Hillier, Gabor T. Marth, Aaron R. Quinlan, David Dooling, Ginger Fewell, and Derek Barnett et al. Whole-genome sequencing and variant discovery in C. elegans. Nature Methods, 5(2): , Christian Gilissen, Alexander Hoischen, Han G. Brunner, and Joris A. Veltman. Disease gene identification strategies for exome sequencing. European Journal of Human Genetics, 20(5): ,

206 239 Mahjoubeh Jalali Sefid Dashti and Junaid Gamieldien. Identifying candidate function-impacting variants. BioTechniques, 62(1):18 30, Damian Smedley and Peter N. Robinson. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Medicine, 7(1):81, Vincent J. Henry, Anita E. Bandrowski, Anne-Sophie Pepin, Bruno J. Gonzalez, and Arnaud Desfeux. OMICtools: an informative directory for multi-omic data analysis. Database, 2014:bau069 bau069, Stephan Pabinger, Andreas Dander, Maria Fischer, Rene Snajder, Michael Sperk, and Mirjana Efremova et al. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics, 15(2): , Min Zhao and Zhongming Zhao. CNVannotator: a comprehensive annotation server for copy number variation in the human genome. 8(11):e80170, PLOS ONE, 244 Eric R. Gamazon, Wei Zhang, Anuar Konkashbaev, Shiwei Duan, Emily O. Kistner, and Dan L. Nicolae et al. SCAN: SNP and copy number annotation. Bioinformatics, 26(2): , Kai Wang, Mingyao Li, and Hakon Hakonarson. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38(16):e164, Kai Wang, Mingyao Li, Dexter Hadley, Rui Liu, Joseph Glessner, and Struan F.A. Grant et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research, 17(11): , Vladimir Makarov, Tina O Grady, Guiqing Cai, Jayon Lihm, Joseph D. Buxbaum, and Seungtai Yoon. AnnTools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics, 28(5): , Ryan L. Collins, Matthew R. Stone, Harrison Brand, Joseph T. Glessner, and Michael E. Talkowski. CNView: a visualization and annotation tool for 175

207 copy number variation from whole-genome sequencing. biorxiv, page , Yuanwei Zhang, Zhenhua Yu, Rongjun Ban, Huan Zhang, Furhan Iqbal, and Aiwu Zhao et al. DeAnnCNV: a tool for online detection and annotation of copy number variations from whole-exome sequencing data. Nucleic Acids Research, 43(W1):W289 W294, Galina A. Erikson, Neha Deshpande, Balachandar G. Kesavan, and Ali Torkamani. SG-ADVISER CNV: copy-number variant annotation and interpretation. Genetics in Medicine, 17(9): , Stephen T. Sherry, Ming H. Ward, Michael Kholodov, Jonathan Baker, Lon Phan, Elizabeth M. Smigielski, and Karl Sirotkin. dbsnp: the NCBI database of genetic variation. Nucleic Acids Research, 29(1): , Consortium Genomes Project. A map of human genome variation from population-scale sequencing. Nature, 467(7319): , Feng Zhang and James R. Lupski. Non-coding genetic variants in human disease. Human Molecular Genetics, 24(R1):R102 R110, Anna-Maija Sulonen, Pekka Ellonen, Henrikki Almusa, Maija Lepisto, Samuli Eldfors, and Sari Hannula et al. Comparison of solution-based exome capture methods for next generation sequencing. Genome Biology, 12(9):R94, Yu Xu, Hui Jiang, Chris Tyler-Smith, Yali Xue, Tao Jiang, and Jiawei Wang et al. Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biology, 12(9):1, Yan Guo, Jirong Long, Jing He, Chung-I. Li, Qiuyin Cai, and Xiao-Ou Shu et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics, 13(1):194, Alan P. Boyle, Eurie L. Hong, Manoj Hariharan, Yong Cheng, Marc A. Schaub, and Maya Kasowski et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Research, 22(9): ,

208 258 Matthew R. Nelson, Daniel Wegmann, Margaret G. Ehm, Darren Kessner, Pamela St Jean, and Claudio Verzilli et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337(6090): , Elizabeth T. Cirulli and David B. Goldstein. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Reviews Genetics, 11(6): , Nature 260 Exome Variant Server. Exome variant server, URL: Monkol Lek, Konrad J. Karczewski, Eric V. Minikel, Kaitlin E. Samocha, Eric Banks, and Timothy Fennell et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616): , Eugene V. Davydov, David L. Goode, Marina Sirota, Gregory M. Cooper, Arend Sidow, and Serafim Batzoglou. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Computational Biology, 6(12):e , Lisenka E.L.M. Vissers, Joep de Ligt, Christian Gilissen, Irene Janssen, Marloes Steehouwer, and Petra de Vries et al. A de novo paradigm for mental retardation. Nature Genetics, 42(12): , Gregory M. Cooper, David L. Goode, Sarah B. Ng, Arend Sidow, Michael J. Bamshad, Jay Shendure, and Deborah A. Nickerson. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nature Methods, 7(4): , Michael Krawczak, Edward V. Ball, Iain Fenton, Peter D. Stenson, Shaun Abeysinghe, Nick Thomas, and David N. Cooper. Human gene mutation database - a biomedical information and research resource. Human Mutation, 15(1):45, Pauline C. Ng and Steven Henikoff. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research, 31(13): ,

209 267 Ivan Adzhubei, Daniel M. Jordan, and Shamil R. Sunyaev. Predicting functional effect of human missense mutations using PolyPhen-2. Current Protocols in Human Genetics, pages 7 20, Holly K. Tabor, Neil J. Risch, and Richard M. Myers. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Review Genetics, 3(5): , Jennifer M. Kwon and Alison M. Goate. The candidate gene approach. Alcohol Research and Health, 24(3): , Nadav Ahituv, Nihan Kavaslar, Wendy Schackwitz, Anna Ustaszewska, Joel Martin, and Sybil Hebert et al. Medical sequencing at the extremes of human body mass. The American Journal of Human Genetics, 80(4): , Amelie Bonnefond, Nathalie Clement, Katherine Fawcett, Loic Yengo, Emmanuel Vaillant, and Jean-Luc Guillaume et al. Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes. Nature Genetics, 44(3): , Jonathan C. Cohen, Robert S. Kiss, Alexander Pertsemlidis, Yves L. Marcel, Ruth McPherson, and Helen H. Hobbs. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305(5685): , Dorothee Diogo, Fina Kurreeman, Eli A. Stahl, Katherine P. Liao, Namrata Gupta, and Jeffrey D. Greenberg et al. Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. The American Journal of Human Genetics, 92(1):15 27, Weizhen Ji, Jia Nee Foo, Brian J. O Roak, Hongyu Zhao, Martin G. Larson, and David B. Simon et al. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nature Genetics, 40(5): , Guoqing Diao and D.Y. Lin. Variance-components methods for linkage and association analysis of ordinal traits in general pedigrees. Epidemiology, 34(3): , Genetic 178

210 276 George D. Garson. Variance Components Analysis. Statistical Associates Publishers, Asheboro, NC, John Blangero and Laura Almasy. Solar: sequential oligogenic linkage analysis routines. Population Genetics Laboratory Technical Report, 6, Laura Almasy and John Blangero. Multipoint quantitative-trait linkage analysis in general pedigrees. The American Journal of Human Genetics, 62(5): , Christopher I. Amos. Robust variance-components approach for assessing genetic linkage in pedigrees. 54(3): , American Journal of Human Genetics, 280 Gail P. Jarvik, Laura M. Amendola, Jonathan S. Berg, Kyle Brothers, Ellen W. Clayton, and Wendy Chung et al. Return of genomic results to research participants: the floor, the ceiling, and the choices in between. The American Journal of Human Genetics, 94(6): , R Core Team. R: a language and environment for statistical computing., Karin V. Fuentes Fajardo, David Adams, Nisc Comparative Sequencing Program, Christopher E. Mason, Murat Sincan, and Cynthia Tifft et al. Detecting false-positive signals in exome sequencing. Human Mutation, 33(4): , Giulio Genovese, Menachem Fromer, Eli A. Stahl, Douglas M. Ruderfer, Kimberly Chambert, and Mikael Landen et al. Increased burden of ultra-rare protein-altering variants among 4,877 individuals with schizophrenia. Nature Neuroscience, 19(11): , Nathan O. Stitziel, Adam Kiezun, and Shamil Sunyaev. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biology, 12(9):227, The International HapMap Consortium. A haplotype map of the human genome. Nature, 437(7063): ,

211 286 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature, 467(7319): , McKusick-Nathans Institute of Genetic Medicine. Online mendelian inheritance in man, OMIM, URL: Agilent Technologies. Clearseq cancer research panels, URL: Illumina. Truseq amplicon - cancer panel, URL: Ravindranath Duggirala, Jeff T. Williams, Sarah Williams-Blangero, and John Blangero. A variance component approach to dichotomous trait linkage analysis using a threshold model. Genetic Epidemiology, 14(6): , Bo Peng, Robert K. Yu, Kevin L. DeHoff, and Christopher I. Amos. Normalizing a large number of quantitative traits using empirical normal quantile transformation. BMC Proceedings, 1(1):S156, J. Martin Bland and Douglas G. Altman. Multiple significance tests: the Bonferroni method. BMJ, 310(6973):170, Frida Belinky, Noam Nativ, Gil Stelzer, Shahar Zimmerman, Tsippi Iny Stein, Marilyn Safran, and Doron Lancet. Pathcards: multi-source consolidation of human biological pathways. Database, 2015, Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, and J. Michael Cherry et al. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1):25 29, Mate Ongenaert, Leander Van Neste, Tim De Meyer, Gerben Menschaert, Sofie Bekaert, and Wim Van Criekinge. PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Research, 36(suppl 1):D842 D846, Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. Entrez Gene: gene-centered information at NCBI. 33(suppl_1):D54 D58, Nucleic Acids Research, 180

212 297 Junghwa Lim, Daniel A. Ritt, Ming Zhou, and Deborah K. Morrison. The CNK2 scaffold interacts with vilse and modulates Rac cycling during spine morphogenesis in hippocampal neurons. Current Biology, 24(7): , Gertraud Maskarinec, Yukiko Morimoto, Sreang Heak, Marissa Isaki, Astrid Steinbrecher, Laurie J. Custer, and Adrian A. Franke. Urinary estrogen metabolites in two soy trials with premenopausal women. European Journal of Clinical Nutrition, 66(9): , Brook E. Harmon, Yukiko Morimoto, Fanchon Beckford, Adrian A. Franke, Frank Z. Stanczyk, and Gertraud Maskarinec. Oestrogen levels in serum and urine of premenopausal women eating low and high amounts of meat. Public Health Nutrition, 17(9): , Reetobrata Basu, Nicholas Baumgaertel, Shiyong Wu, and John J. Kopchick. Growth hormone receptor knockdown sensitizes human melanoma cells to chemotherapy by attenuating expression of ABC drug efflux pumps. Hormones and Cancer, pages 1 14, Juntao Yao, Xuan Yao, Tao Tian, Xiao Fu, Wenjuan Wang, and Suoni Li et al. ABCB5-ZEB1 axis promotes invasion and metastasis in breast cancer cells. Oncology Research Featuring Preclinical and Clinical Cancer Therapeutics, 25(3): , Thilo Gambichler, A.L. Petig, Eggert Stockfleth, and Markus Stucker. Expression of SOX10, ABCB5 and CD271 in melanocytic lesions and correlation with survival data of patients with melanoma. Clinical and Experimental Dermatology, 41(7): , Yang Wang and Jia-Song Teng. Increased multi-drug resistance and reduced apoptosis in osteosarcoma side population cells are crucial factors for tumor recurrence. Experimental and Therapeutic Medicine, 12(1):81 86, Huanle Zhang, P. Wang, Miao-zhen Lu, and Shu-Dong Zhang. c-myc regulation of ATP-binding cassette transporter reverses chemoresistance in CD133 (+) colon cancer stem cells. Sheng Li Xue Bao:[Acta physiologica Sinica], 68(2): ,

213 305 Sonja Kleffel, Nayoung Lee, Cecilia Lezcano, Brian J. Wilson, Kristine Sobolewski, and Karim R. Saab et al. ABCB5-targeted chemoresistance reversal inhibits Merkel cell carcinoma growth. Journal of Investigative Dermatology, 136(4): , Martin Grimm, Marcel Cetindis, Max Lehmann, Thorsten Biegner, Adelheid Munz, Peter Teriete, and Siegmar Reinert. Apoptosis resistance-related ABCB5 and DNaseX (Apo10) expression in oral carcinogenesis. Acta Odontologica Scandinavica, 73(5): , Hala M. Farawela, Mervat M. Khorshied, Neemat M. Kassem, Heba A. Kassem, and Hamdy M. Zawam. The clinical relevance and prognostic significance of adenosine triphosphate ATP-binding cassette (ABCB5) and multidrug resistance (MDR1) genes expression in acute leukemia: an Egyptian study. Journal of Cancer Research and Clinical Oncology, 140(8): , Ramaswamy Govindan, Li Ding, Malachi Griffith, Janakiraman Subramanian, Nathan D. Dees, and Krishna L. Kanchi et al. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell, 150(6): , Brian J. Wilson, Tobias Schatton, Qian Zhan, Martin Gasser, Jie Ma, and Karim R. Saab et al. ABCB5 identifies a therapy-refractory tumor cell population in colorectal cancer patients. Cancer Research, 71(15): , Siu Tim Cheung, Phyllis F.Y. Cheung, Christine K.C. Cheng, Nicholas C.L. Wong, and Sheung Tat Fan. Granulin-epithelin precursor and ATP-dependent binding cassette (ABC)B5 regulate liver cancer cell chemoresistance. Gastroenterology, 140(1): , Ji Yeon Yang, Seon-Ah Ha, Yun-Sik Yang, and Jin Woo Kim. p-glycoprotein ABCB5 and YB-1 expression plays a role in increased heterogeneity of breast cancer cells: correlations with cell fusion and doxorubicin resistance. BMC Cancer, 10(1): ,

214 312 Mitsuru Higa, Xue Zhang, Kiyoji Tanaka, and Masafumi Saijo. Stabilization of Ultraviolet (UV)-stimulated Scaffold Protein A by interaction with ubiquitin-specific peptidase 7 is essential for transcription-coupled nucleotide excision repair. Journal of Biological Chemistry, 291(26): , James E. Cleaver, Angela M. Brennan-Minnella, Raymond A. Swanson, Ka-wing Fong, Junjie Chen, and Kai-ming Chou et al. Mitochondrial reactive oxygen species are scavenged by Cockayne syndrome B protein in human fibroblasts without nuclear DNA damage. Proceedings of the National Academy of Sciences, 111(37): , Jia Guo, Philip C. Hanawalt, and Graciela Spivak. Comet-FISH with strand-specific probes reveals transcription-coupled repair of 8-oxoGuanine in human cells. Nucleic Acids Research, 41(16): , Petra Schwertman, Wim Vermeulen, and Jurgen A. Marteijn. UVSSA and USP7, a new couple in transcription-coupled DNA repair. 122(4): , Chromosoma, 316 Jia Fei and Junjie Chen. KIAA1530 protein is recruited by Cockayne syndrome complementation group protein A (CSA) to participate in transcription-coupled repair (TCR). Journal of Biological Chemistry, 287(42): , Gaowu Hu, Ye Xu, Wenquan Chen, Jiandong Wang, Chunying Zhao, and Ming Wang. RNA interference of IQ motif containing GTPase-activating protein 3 (IQGAP3) inhibits cell proliferation and invasion in breast carcinoma cells. Oncology Research Featuring Preclinical and Clinical Cancer Therapeutics, 24(6): , Malwina Michalak, Uwe Warnken, Sabine Andre, Martina Schnolzer, Hans-Joachim Gabius, and Juergen Kopitz. Detection of proteome changes in human colon cancer induced by cell surface binding of growth-inhibitory human galectin-4 using quantitative SILAC-based proteomics. Journal of Proteome Research, 15(12): ,

215 319 Yanqin Gu, Linfeng Lu, Lingfeng Wu, Hao Chen, Wei Zhu, and Yi He. Identification of prognostic genes in kidney renal clear cell carcinoma by RNA-seq data analysis. Molecular Medicine Reports, 15(4): , Andreas Ritter, Mourad Sanhaji, Alexandra Friemel, Susanne Roth, Udo Rolle, Frank Louwen, and Juping Yuan. Functional analysis of phosphorylation of the mitotic centromere-associated kinesin by Aurora B kinase in human tumor cells. Cell Cycle, 14(23): , Yangxing Zhao, Feng Xue, Jinfeng Sun, Shicheng Guo, Hongyu Zhang, and Bijun Qiu et al. Genome-wide methylation profiling of the different stages of hepatitis b virus-related hepatocellular carcinoma development in plasma cell-free DNA reveals potential biomarkers for early detection and high-risk monitoring of hepatocellular carcinoma. Clinical Epigenetics, 6(1):30, Yong-Chen Lu, Xin Yao, Jessica S. Crystal, Yong F. Li, Mona El-Gamil, and Colin Gross et al. Efficient identification of mutated cancer antigens recognized by T cells associated with durable tumor regressions. Clinical Cancer Research, 20(13): , Cerys S. Manning, Steven Hooper, and Erik A. Sahai. Intravital imaging of SRF and Notch signalling identifies a key role for EZH2 in invasive melanoma cells. Oncogene, 34(33): , Peng Lyu, Shu-Dong Zhang, Hiu-Fung Yuen, Cian M. McCrudden, Qing Wen, Kwok-Wah Chan, and Hang Fai Kwok. Identification of TWIST-interacting genes in prostate cancer. Science China Life Sciences, pages 1 11, Kimberly A. Krautkramer, Amelia K. Linnemann, Danielle A. Fontaine, Amy L. Whillock, Ted W. Harris, and Gregory J. Schleis et al. Tcf19 is a novel islet factor necessary for proliferation and survival in the INS-1 beta-cell line. American Journal of Physiology - Endocrinology And Metabolism, 305(5):E600 E610, Sarah E. Flanagan, Ann-Marie Patch, and Sian Ellard. Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genetic Testing and Molecular Biomarkers, 14(4): ,

216 327 Brian H. Shirts, Colin C. Pritchard, and Tom Walsh. Family-specific variants and the limits of human genetics. Trends in Molecular Medicine, 22(11): , James R. Lupski, John W. Belmont, Eric Boerwinkle, and Richard A. Gibbs. Clan genomics and the complex architecture of human disease. Cell, 147(1):32 43, Alex Coventry, Lara M. Bull-Otterson, Xiaoming Liu, Andrew G. Clark, Taylor J. Maxwell, and Jacy Crosby et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications, 1: , Daniel J. Turner, Marcos Miretti, Diana Rajan, Heike Fiegler, Nigel P. Carter, and Martyn L. Blayney et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nature Genetics, 40(1):90 95, Adam R. Boyko, Scott H. Williamson, Amit R. Indap, Jeremiah D. Degenhardt, Ryan D. Hernandez, and Kirk E. Lohmueller et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLOS Genetics, 4(5):e , Michael Dean and Tarmo Annilo. Evolution of the ATP-binding cassette (ABC) transporter superfamily in vertebrates. Annual Review of Genomics Human Genetics, 6: , Natasha Y. Frank, Armen Margaryan, Ying Huang, Tobias Schatton, Ana Maria Waaga-Gasser, and Martin Gasser et al. ABCB5-mediated doxorubicin transport and chemoresistance in human malignant melanoma. Cancer Research, 65(10): , Natasha Y. Frank, Shona S. Pendse, Peter H. Lapchak, Armen Margaryan, Debbie Shlain, and Carsten Doeing et al. Regulation of progenitor cell fusion by ABCB5 P-glycoprotein, a novel human ATP-binding cassette transporter. Journal of Biological Chemistry, 278(47): , Claudina Aleman, Jean-Philippe Annereau, Xing-Jie Liang, Carol O. Cardarelli, Barbara Taylor, and Jun Jie Yin et al. P-glycoprotein, expressed 185

217 in multidrug resistant cells, is not responsible for alterations in membrane fluidity or membrane potential. Cancer Research, 63(12):3084, Marine Chartrain, Joelle Riond, Aline Stennevin, Isabelle Vandenberghe, Bruno Gomes, and Laurence Lamant et al. Melanoma chemotherapy leads to the selection of ABCB5-expressing cells. PLOS One, 7(5):e36762, Brian J. Wilson, Karim R. Saab, Jie Ma, Tobias Schatton, Pablo Putz, and Qian Zhan et al. ABCB5 maintains melanoma-initiating cells through a proinflammatory cytokine signaling circuit. Cancer Research, 74(15): , Ge Yang, Ou Jiang, Daiqiong Ling, Xiaoyue Jiang, Pingzong Yuan, and Guang Zeng et al. MicroRNA-522 reverses drug resistance of doxorubicin-induced HT29 colon cancer cell by targeting ABCB5. Molecular Medicine Reports, 12(3): , Elma Zaganjor, Lauren M. Weil, Joshua X. Gonzales, John D. Minna, and Melanie H. Cobb. Ras transformation uncouples the kinesin-coordinated cellular nutrient response. Proceedings of the National Academy of Sciences, 111(29): , Mourad Sanhaji, Claire Therese Friel, Nina-Naomi Kreis, Andrea Kramer, Claudia Martin, and Jonathon Howard et al. Functional and spatial regulation of mitotic centromere-associated kinesin by cyclin-dependent kinase 1. Molecular and Cellular Biology, 30(11): , Mourad Sanhaji, Andreas Ritter, Hannah R. Belsham, Claire T. Friel, Susanne Roth, Frank Louwen, and Juping Yuan. Polo-like kinase 1 regulates the stability of the mitotic centromere-associated kinesin in mitosis. Oncotarget, 5(10): , Andreas Ritter, Mourad Sanhaji, Kerstin Steinhauser, Susanne Roth, Frank Louwen, and Juping Yuan. The activity regulation of the mitotic centromere-associated kinesin by Polo-like kinase 1. Oncotarget, 6(9): , Liangyu Zhang, Hengyi Shao, Yuejia Huang, Feng Yan, Youjun Chu, and Hai Hou et al. PLK1 phosphorylates mitotic centromere-associated kinesin 186

218 and promotes its depolymerase activity. Journal of Biological Chemistry, 286(4): , Todd Maney, Andrew W. Hunter, Mike Wagenbach, and Linda Wordeman. Mitotic centromere-associated kinesin is important for anaphase chromosome segregation. The Journal of Cell Biology, 142(3): , Ayana T. Moore, Kathleen E. Rankin, George Von Dassow, Leticia Peris, Michael Wagenbach, and Yulia Ovechkina et al. MCAK associates with the tips of polymerizing microtubules. The Journal of Cell Biology, 169(3): , Alexander Braun, Kyvan Dang, Felinah Buslig, Michelle A. Baird, Michael W. Davidson, Clare M. Waterman, and Kenneth A. Myers. Rac1 and Aurora A regulate MCAK to polarize microtubule growth in migrating endothelial cells. The Journal of Cell Biology, 206(1):97 112, Sacha Gnjatic, Yanran Cao, Uta Reichelt, Emre F. Yekebas, Christina Nolker, and Andreas H. Marx et al. NY-CO-58/KIF2C is overexpressed in a variety of solid tumors and induces frequent T cell responses in patients with colorectal cancer. International Journal of Cancer, 127(2): , Arata Shimo, Chizu Tanikawa, Toshihiko Nishidate, Meng-Lay Lin, Koichi Matsuda, and Jae-Hyun Park et al. Involvement of kinesin family member 2C/mitotic centromere-associated kinesin overexpression in mammary carcinogenesis. Cancer Science, 99(1):62 70, Yuji Nakamura, Fumiaki Tanaka, Naoto Haraguchi, Koshi Mimori, Tatsuhiko Matsumoto, and Hiroshi Inoue et al. Clinicopathological and biological significance of mitotic centromere-associated kinesin overexpression in human gastric cancer. British Journal of Cancer, 97(4): , Kazuhiro Ishikawa, Yukio Kamohara, Fumiaki Tanaka, Naoto Haraguchi, Koshi Mimori, Hiroshi Inoue, and Masatomo Mori. Mitotic centromere-associated kinesin is a novel marker for prognosis and lymph node metastasis in colorectal cancer. British Journal of Cancer, 98(11): ,

219 351 Carlo Turano, Sabina Coppari, Fabio Altieri, and Anna Ferraro. Proteins of the PDI family: unpredicted non-er locations and functions. Journal of Cellular Physiology, 193(2): , Peter Klappa, Lloyd W. Ruddock, Nigel J. Darby, and Robert B. Freedman. The b domain provides the principal peptide-binding site of protein disulfide isomerase but all domains contribute to binding of misfolded proteins. The EMBO Journal, 17(4): , Xin-Miao Fu and Bao Ting Zhu. Human pancreas-specific protein disulfide isomerase homolog (PDIp) is an intracellular estrogen-binding protein that modulates estrogen levels and actions in target cells. The Journal of Steroid Biochemistry and Molecular Biology, 115(1):20 29, Roberta Maestro, Angelo P. Dei Tos, Yasuo Hamamori, Svetlana Krasnokutsky, Vittorio Sartorelli, and Larry Kedes et al. Twist is a potential oncogene that inhibits apoptosis. Genes & Development, 13(17): , Eric N. Olson and William H. Klein. bhlh factors in muscle development: dead lines and commitments, what to leave in and what to leave out. Genes & Development, 8(1):1 8, Elisabeth H. Villavicencio, Joon Won Yoon, Daniel J. Frank, Ernst-Martin Fuchtbauer, David O. Walterhouse, and Philip M. Iannaccone. Cooperative E-box regulation of human GLI1 by TWIST and USF. Genesis, 32(4): , Erika Rosivatz, Ingrid Becker, Katja Specht, Elena Fricke, Birgit Luber, and Raymonde Busch et al. Differential expression of the epithelial-mesenchymal transition regulators snail, SIP1, and twist in gastric cancer. The American Journal of Pathology, 161(5): , P. Andrew Futreal, Lachlan Coin, Mhairi Marshall, Thomas Down, Timothy Hubbard, and Richard Wooster et al. A census of human cancer genes. Nature Reviews Cancer, 4(3): , Zhengyan Kan, Bijay S. Jaiswal, Jeremy Stinson, Vasantharajan Janakiraman, Deepali Bhatt, and Howard M. Stern et al. Diverse somatic 188

220 mutation patterns and pathway alterations in human cancers. 466(7308): , Nature, 360 Bing Yu, Sandra A. O Toole, and Ronald J. Trent. Somatic DNA mutation analysis in targeted therapy of solid tumours. 4(2): , Translational Pediatrics, 361 J. Guillermo Paez, Pasi A. Janne, Jeffrey C. Lee, Sean Tracy, Heidi Greulich, and Stacey Gabriel et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science, 304(5676): , Keith T. Flaherty, Igor Puzanov, Kevin B. Kim, Antoni Ribas, Grant A. McArthur, and Jeffrey A. Sosman et al. Inhibition of mutated, activated BRAF in metastatic melanoma. New England Journal of Medicine, 363(9): , Astrid Lievre, Jean-Baptiste Bachet, Delphine Le Corre, Valerie Boige, Bruno Landi, and Emile Jean-Francois et al. KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Research, 66(8): , Martin H. Cohen, Ann Farrell, Robert Justice, and Richard Pazdur. Approval summary: imatinib mesylate in the treatment of metastatic and/or unresectable malignant gastrointestinal stromal tumors. The Oncologist, 14(2): , Georgina L. Ryland, Maria A. Doyle, David Goode, Samantha E. Boyle, David Y. H. Choong, and Simone M. Rowley et al. Loss of heterozygosity: what is it good for? BMC Medical Genomics, 8(1):45, Brenda L. Gallie, A. Linn Murphree, Louise C. Strong, and Rhiannon L. White. Expression of recessive alleles by chromosomal mechanisms in retinoblastoma. Nature, 305(779784):3134, Sofia D. Merajver, Thomas S. Frank, Junzhe Xu, Trinh M. Pham, Kathleen A. Calzone, and Pamela Bennett-Baker et al. Germline BRCA1 mutations and loss of the wild-type allele in tumors from families with early onset breast and ovarian cancer. Clinical Cancer Research, 1(5): ,

221 368 Daniel C. Koboldt, Qunyuan Zhang, David E. Larson, Dong Shen, Michael D. McLellan, and Ling Lin et al. VarScan2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research, 22(3): , Daniel C. Koboldt, David E. Larson, and Richard K. Wilson. Using VarScan2 for germline variant calling and somatic mutation detection. Current Protocols in Bioinformatics, 44: , Adam B. Olshen, Venkatraman E. Seshan, Robert Lucito, and Michael Wigler. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5(4): , Richard Redon, Shumpei Ishikawa, Karen R. Fitch, Lars Feuk, George H. Perry, and T. Daniel Andrews et al. Global variation in copy number in the human genome. Nature, 444(7118): , Adam Shlien and David Malkin. Copy number variations and cancer. Genome Medicine, 1(6):62 62, Darrin Stuart and William R. Sellers. Linking somatic genetic alterations in cancer to therapeutics. Current Opinion in Cell Biology, 21(2): , Rebecca J. Leary, Jimmy C. Lin, Jordan Cummins, Simina Boca, Laura D. Wood, and D. Williams Parsons et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers. Proceedings of the National Academy of Sciences, 105(42): , Evelyn Despierre, Matthieu Moisse, Betul Yesilyurt, Jalid Sehouli, Ioana Braicu, and Sven Mahner et al. Somatic copy number alterations predict response to platinum therapy in epithelial ovarian cancer. Gynecologic Oncology, 135(3): , Hongtao Xu, Xia Zhu, Zulong Xu, Yue Hu, Shiping Bo, Tongjing Xing, and Kuichun Zhu. Non-invasive analysis of genomic copy number variation in patients with hepatocellular carcinoma by next generation DNA sequencing. Journal of Cancer, 6(3):247,

222 377 Sara Martoreli Silveira, Isabela Werneck da Cunha, Fabio Albuquerque Marchi, Ariane Fidelis Busso, Ademar Lopes, and Silvia Regina Rogatto. Genomic screening of testicular germ cell tumors from monozygotic twins. Orphanet Journal of Rare Diseases, 9(1):181, Sukanya Horpaopan, Isabel Spier, Alexander M. Zink, Janine Altmuller, Stefanie Holzapfel, and Andreas Laner et al. Genome-wide CNV analysis in 221 unrelated patients and targeted high-throughput sequencing reveal novel causative candidate genes for colorectal adenomatous polyposis. International Journal of Cancer, 136(6):E578 E589, Nadine Bonberg, Beate Pesch, Thomas Behrens, Georg Johnen, Dirk Taeger, and Katarzyna Gawrych et al. Chromosomal alterations in exfoliated urothelial cells from bladder cancer cases and healthy men: a prospective screening study. BMC Cancer, 14(1):854, Barbara A. Weir, Michele S. Woo, Gad Getz, Sven Perner, Li Ding, and Rameen Beroukhim et al. Characterizing the cancer genome in lung adenocarcinoma. Nature, 450(7168), Astrid M. Eder, Xiaomei Sui, Daniel G. Rosen, Laura K. Nolden, Kwai Wa Cheng, and John P. Lahad et al. Atypical PKCI contributes to poor prognosis through loss of apical-basal polarity and cyclin E overexpression in ovarian cancer. Proceedings of the National Academy of Sciences of the United States of America, 102(35): , Idoya Lahortiga, Kim De Keersmaecker, Pieter Van Vlierberghe, Carlos Graux, Barbara Cauwelier, and Frederic Lambert et al. Duplication of the MYB oncogene in T cell acute lymphoblastic leukemia. Nature Genetics, 39(5): , Lars Zender, Mona S. Spector, Wen Xue, Peer Flemming, Carlos Cordon-Cardo, and John Silke et al. Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell, 125(7): , Charles G. Mullighan, Salil Goorha, Ina Radtke, Christopher B. Miller, Elaine Coustan-Smith, and James D. Dalton et al. Genome-wide 191

223 analysis of genetic alterations in acute lymphoblastic leukaemia. Nature, 446(7137): , Ruprecht Wiedemeyer, Cameron Brennan, Timothy P. Heffernan, Yonghong Xiao, John Mahoney, and Alexei Protopopov et al. Feedback circuit among INK4 tumor suppressors constrains human glioblastoma development. Cancer Cell, 13(4): , The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216): , Dhananjay Chitale, Yixuan Gong, Barry S. Taylor, Stephen Broderick, Cameron Brennan, and Romel Somwar et al. An integrated genomic analysis of lung cancer reveals loss of DUSP4 in EGFR-mutant tumors. Oncogene, 28(31): , Erin D. Pleasance, R. Keira Cheetham, Philip J. Stephens, David J. McBride, Sean J. Humphray, and Chris D. Greenman et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature, 463(7278): , Christopher T. Saunders, Wendy S.W. Wong, Sajani Swamy, Jennifer Becq, Lisa J. Murray, and R. Keira Cheetham. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics, 28(14): , Heather E. Wheeler, Michael L. Maitland, M. Eileen Dolan, Nancy J. Cox, and Mark J. Ratain. Cancer pharmacogenomics: strategies and challenges. Nature Reviews Genetics, 14(1):23 34, Wanjuan Yang, Jorge Soares, Patricia Greninger, Elena J. Edelman, Howard Lightfoot, and Simon Forbes et al. Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Research, 41(D1):D955 D961, Lin Wu, Nancy Patten, Carl T. Yamashiro, and Buena Chui. Extraction and amplification of DNA from formalin-fixed, paraffin-embedded tissues. Applied Immunohistochemistry & Molecular Morphology, 10(3): ,

224 393 Sarah Munchel, Yen Hoang, Yue Zhao, Joseph Cottrell, Brandy Klotzle, and Andrew K. Godwin et al. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics. Oncotarget, 6(28): , Simon Andrews. Fastqc: a quality control tool for high throughput sequence data, Anthony M. Bolger, Marc Lohse, and Bjoern Usadel. Trimmomatic: a flexible trimmer for Illumina sequence data. 30(15): , Bioinformatics, 396 Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Broad Institute of Harvard and MIT, pages 1 3, Genome Research Limited. Samtools- utilities for the sequence alignment/map (SAM) format; URL: E. Seshan Venkatraman and Adam B. Olshen. Dnacopy: a package for analyzing DNA copy data. Department of Epidemiology and Biostatistics. Memorial Sloan-Kettering Cancer Center, Yan Guo, Fei Ye, Quanghu Sheng, Travis Clark, and David C. Samuels. Three-stage quality control strategies for DNA re-sequencing data. Briefings in Bioinformatics, 15(6): , Thomas L. Clarke, Maria Pilar Sanchez-Bailon, Kelly Chiang, John J. Reynolds, Joaquin Herrero-Ruiz, and Tiago M. Bandeiras et al. PRMT5-dependent methylation of the TIP60 coactivator RUVBL1 is a key regulator of homologous recombination. Molecular Cell, 65(5): , Bhavna Kumar, Arti Yadav, Nicole V. Brown, Songzhu Zhao, Michael J. Cipolla, and Paul E. Wakely et al. Nuclear PRMT5, cyclin D1 and IL-6 are associated with poor outcome in oropharyngeal squamous cell carcinoma patients and is inversely associated with p16-status. Oncotarget, 8(9): ,

225 402 Hao Yang, Xiaoping Zhao, Li Zhao, Liu Liu, Jiajin Li, and Wenzhi Jia et al. PRMT5 competitively binds to CDK4 to promote G1-S transition upon glucose induction in hepatocellular carcinoma. Oncotarget, 7(44):72131, Xiaxin Deng, Guoqiang Shao, Hong-Tao Zhang, Chunyan Li, Dajie Zhang, and Li Cheng et al. Protein arginine methyltransferase 5 functions as an epigenetic activator of the androgen receptor to promote prostate cancer cell growth. Oncogene, 36(9): , Yan Sheng, Hongtao Wang, Dongchen Liu, Cheng Zhang, Yupeng Deng, and Fan Yang et al. Methylation of tumor suppressor gene CDH13 and SHP1 promoters and their epigenetic regulation by the UHRF1/PRMT5 complex in endometrial carcinoma. Gynecologic Oncology, 140(1): , H. Chen, Benjamin Lorton, Vijayalaxmi Gupta, and David Shechter. A TGFB-PRMT5-MEP50 axis regulates cancer cell invasion through histone H3 and H4 arginine methylation coupled transcriptional activation and repression. Oncogene, 36(3): , Annie Rochette, Nadia Boufaied, Eleonora Scarlata, Lucie Hamel, Fadi Brimo, and Hayley C. Whitaker et al. Asporin is a stromally expressed marker associated with prostate cancer progression. British Journal of Cancer, 116(6): , Paula J. Hurley, Debasish Sundi, Brian Shinder, Brian W. Simons, Robert M. Hughes, and Rebecca M. Miller et al. Germline variants in asporin vary by race, modulate the tumor microenvironment, and are differentially associated with metastatic prostate cancer. Clinical Cancer Research, 22(2):448, Pamela Maris, Arnaud Blomme, Ana Perez Palacios, Brunella Costanza, Akeila Bellahcene, and Elettra Bianchi et al. Asporin is a fibroblast-derived TGF-B1 inhibitor and a tumor suppressor associated with good prognosis in breast cancer. PLOS Medicine, 12(9):e , Qian Ding, Mei Zhang, and Can Liu. Asporin participates in gastric cancer cell growth and migration by influencing EGF receptor signaling. Oncology Reports, 33(4): ,

226 410 Rika Satoyoshi, Sei Kuriyama, Namiko Aiba, Masakazu Yashiro, and Masamitsu Tanaka. Asporin activates coordinated invasion of scirrhous gastric cancer and cancer-associated fibroblasts. Oncogene, 34(5): , Andrei Turtoi, Davide Musmeci, Yinghong Wang, Bruno Dumont, Joan Somja, and Generoso Bevilacqua et al. Identification of novel accessible proteins bearing diagnostic and therapeutic potential in human pancreatic ductal adenocarcinoma. Journal of Proteome Research, 10(9): , Thai H. Ho, Daniel J. Serie, Mansi Parasramka, John C. Cheville, Brian M. Bot, and Weihong Tan et al. Differential gene expression profiling of matched primary renal cell carcinoma and metastases reveals upregulation of extracellular matrix genes. Annals of Oncology, 28(3): , Magdalena Zakrzewska, Wojciech Fendler, Krzysztof Zakrzewski, Beata Sikorska, Wieslawa Grajkowska, and Bozenna Dembowska-Baginska et al. Altered microrna expression is associated with tumor grade, molecular background and outcome in childhood infratentorial ependymoma. PLOS ONE, 11(7):e , John Richard McPherson, Choon-Kiat Ong, Cedric Chuan-Young Ng, Vikneswari Rajasegaran, Hong-Lee Heng, and Willie Shun-Shing Yu et al. Whole-exome sequencing of breast cancer, malignant peripheral nerve sheath tumor and neurofibroma from a patient with neurofibromatosis type 1. Cancer Medicine, 4(12): , Pooja Ganguly and Niladri Ganguly. Transcriptomic analyses of genes differentially expressed by high-risk and low-risk human papilloma virus E6 oncoproteins. VirusDisease, 26(3): , O.A. Simonova, Ekaterina B. Kuznetsova, Elena V. Poddubskaya, Tatiana V. Kekeeva, R.A. Kerimov, and I.D. Trotsenko et al. DNA methylation in the promoter regions of the laminin family genes in normal and breast carcinoma tissues. Molecular Biology, 49(4): ,

227 417 Anbarasu Lourdusamy, Ruman Rahman, Stuart Smith, and Richard Grundy. microrna network analysis identifies mir-29 cluster as key regulator of LAMA2 in ependymoma. Acta Neuropathologica Communications, 3(1):26 30, Radoslaw Januchowski, Piotr Zawierucha, Marcin Rucinski, and Maciej Zabel. Microarray-based detection and expression analysis of extracellular matrix proteins in drug-resistant ovarian cancer cell lines. Oncology Reports, 32(5): , Suchit Jhunjhunwala, Zhaoshi Jiang, Eric W. Stawiski, Florian Gnad, Jinfeng Liu, and Oleg Mayba et al. Diverse modes of genomic alteration in hepatocellular carcinoma. Genome Biology, 15(8): , Radoslaw Januchowski, Piotr Zawierucha, Marcin Rucinski, Michal Nowicki, and Maciej Zabel. Extracellular matrix proteins expression profiling in chemoresistant variants of the A2780 ovarian cancer cell line. BioMed Research International, 2014:1 9, Akiko Niibori-Nambu, Uichi Midorikawa, Souhei Mizuguchi, Takuichiro Hide, Minako Nagai, and Yoshihiro Komohara et al. Glioma initiating cells form a differentiation niche via the induction of extracellular matrices and integrin av. PLOS ONE, 8(5):e59558, Rong Sheng Ni, Xiaohui Shen, Xiaoyun Qian, Chenjie Yu, Haiyan Wu, and X.I.A. Gao. Detection of differentially expressed genes and association with clinicopathological features in laryngeal squamous cell carcinoma. Oncology Letters, 4(6): , Dwain Mefford and Joel Mefford. Stromal genes add prognostic information to proliferation and histoclinical markers: a basis for the next generation of breast cancer gene signatures. PLOS ONE, 7(6):e37646, Sunwoo Lee, Taejeong Oh, Hyuncheol Chung, Sunyoung Rha, Changjin Kim, and Youngho Moon et al. Identification of GABRA1 and LAMA2 as new DNA methylation markers in colorectal cancer. International Journal of Oncology, 40(3): ,

228 425 Yizhu Lyu, Jiacheng Lou, Yan Yang, Jiuxing Feng, Yuchao Hao, and Shuyu Huang et al. Dysfunction of the WT1-MEG3 signaling promotes AML leukemogenesis via p53 dependent and independent pathways. Leukemia, Piotr Ciesielski, Pawel Jozwiak, Katarzyna Wojcik-Krowiranda, Ewa Forma, Lukasz Cwonda, and Sylwia Szczepaniec et al. Differential expression of ten-eleven translocation genes in endometrial cancers. Tumor Biology, 39(3):1 8, Yoko Kubuki, Takumi Yamaji, Tomonori Hidaka, Takuro Kameda, Kotaro Shide, and Masaaki Sekine et al. TET2 mutation in diffuse large B-cell lymphoma. Journal of Clinical and Experimental Hematopathology, 56(3): , Lars Bullinger, Konstanze Dohner, and Hartmut Dohner. Genomics of acute myeloid leukemia diagnosis and pathways. 35(9): , Journal of Clinical Oncology, 429 Gholamreza Bahari, Mohammad Hashemi, Majid Naderi, and Mohsen Taheri. TET2 promoter DNA methylation and expression in childhood acute lymphoblastic leukemia. Asian Pacific Journal of Cancer Prevention, 17(8): , Satoshi Chiba. Significance of TET2 mutations in myeloid and lymphoid neoplasms. [Rinshoo Ketsueki] The Japanese Journal of Clinical Hematology, 57(6): , Joseph H.R. Hetmanski, Egor Zindy, Jean-Marc Schwartz, and Patrick T. Caswell. A MAPK-driven feedback loop suppresses Rac activity to promote RhoA-driven cancer cell invasion. PLOS Computational Biology, 12(5):e , Pascale Monzo, Yuk Kien Chong, Charlotte Guetta-Terrier, Anitha Krishnasamy, Sharvari R. Sathe, and Evelyn K.F. Yim et al. Mechanical confinement triggers glioma linear migration dependent on formin FHOD3. Molecular Biology of the Cell, 27(8): ,

229 433 Li Chai, Jia Li, and Zhongwei Lv. An integrated analysis of cancer genes in thyroid cancer. Oncology Reports, 35(2): , Nikki R. Paul, Jennifer L. Allen, Anna Chapman, Maria Morlan-Mairal, Egor Zindy, and Guillaume Jacquem et al. a5b1 integrin recycling promotes Arp2/3-independent cancer cell invasion via the formin FHOD3. The Journal of Cell Biology, 210(6): , Deborah French, Wenjian Yang, Cheng Cheng, Susana C. Raimondi, Charles G. Mullighan, and James R. Downing et al. Acquired variation outweighs inherited variation in whole genome analysis of methotrexate polyglutamate accumulation in leukemia. Blood, 113(19): , Zongping Wang, Jie Kang, Xianzhao Deng, Bomin Guo, Bo Wu, and Youben Fan. Knockdown of GATAD2A suppresses cell proliferation in thyroid cancer in vitro. Oncology Reports, 37(4): , Cornelia G. Spruijt, Martijn S. Luijsterburg, Roberta Menafra, Rik G.H. Lindeboom, Pascal W.T.C. Jansen, and Raghu Ram Edupuganti et al. ZMYND8 co-localizes with NuRD on target genes and regulates poly(adp-ribose)-dependent recruitment of GATAD2A/NuRD to sites of DNA damage. Cell Reports, 17(3): , Siddhartha P. Kar, Jonathan Beesley, Ali Amin Al Olama, Kyriaki Michailidou, Jonathan Tyrer, and ZSofia Kote-Jarai et al. Genome-wide meta-analyses of breast, ovarian, and prostate cancer association studies identify multiple new susceptibility loci shared by at least two cancer types. Cancer Discovery, 6(9): , Venkatadri Kolla, Koumudi Naraparaju, Tiangang Zhuang, Mayumi Higashi, Sriharsha Kolla, Gerd A. Blobel, and Garrett M. Brodeur. The tumour suppressor CHD5 forms a NuRD-type chromatin remodelling complex. Biochemical Journal, 468(2): , Morgan P. Torchy, Ali Hamiche, and Bruno P. Klaholz. Structure and function insights into the NuRD chromatin remodeling complex. Cellular and Molecular Life Sciences, 72(13): ,

230 441 Sarah E. Mahoney, Zizhen Yao, C. Chip Keyes, Stephen J. Tapscott, and Scott J. Diede. Genome-wide DNA methylation studies suggest distinct DNA methylation patterns in pediatric embryonal and alveolar rhabdomyosarcomas. Epigenetics, 7(4): , Eric I. Zimmerman, Alice A. Gibson, Shuiying Hu, Aksana Vasilyeva, Shelley J. Orwick, and Guoqing Du et al. Multikinase inhibitors induce cutaneous toxicity through OAT6-mediated uptake and MAP3K7-driven cell death. Cancer Research, 76(1):117, Fanfan Zhou and Guofeng You. Molecular insights into the structure-function relationship of organic anion transporters OATs. Pharmaceutical Research, 24(1):28 36, Wei Cao, Enguang Ma, Li Zhou, Tan Yuan, and Chunying Zhang. Exploring the FGFR3-related oncogenic mechanism in bladder cancer using bioinformatics strategy. World Journal of Surgical Oncology, 15(1):66 73, Vivien Koh, Hsueh Yin Kwan, Woei Loon Tan, Tzia Liang Mah, and Wei Peng Yong. Knockdown of POLA2 increases gemcitabine resistance in lung cancer cells. BMC Genomics, 17(13): , Scooter Willis, Victor M. Villalobos, Olivier Gevaert, Mark Abramovitz, Casey Williams, Branimir I. Sikic, and Brian Leyland-Jones. Single gene prognostic biomarkers in ovarian cancer: a meta-analysis. PLOS ONE, 11(2):e , Guhyun Kang, Hongseok Yun, Choong-Hyun Sun, Inho Park, Seungmook Lee, and Jekeun Kwon et al. Integrated genomic analyses identify frequent gene fusion events and VHL inactivation in gastrointestinal stromal tumors. Oncotarget, 7(6): , Tzia Liang Mah, Xin Ning Adeline Yap, Vachiranee Limviphuvadh, Nanpu Li, Srinath Sridharan, and Vellaisemy Kuralmani et al. Novel SNP improves differential survivability and mortality in non-small cell lung cancer patients. BMC Genomics, 15(9):S20 S27,

231 449 Oluf Dimitri Roe, Adam Szulkin, Endre Anderssen, Arnar Flatberg, Helmut Sandeck, and Tore Amundsen et al. Molecular resistance fingerprint of pemetrexed and platinum in a long-term survivor of mesothelioma. PLOS ONE, 7(8):e40521, Fotis A. Asimakopoulos, Pesach J. Shteper, Svetlana Krichevsky, Eitan Fibach, Aaron Polliack, and Eliezer Rachmilewitz et al. ABL1 methylation is a distinct molecular event associated with clonal evolution of chronic myeloid leukemia. Blood, 94(7): , Adina Aviram, Bruria Witenberg, Mati Shaklai, and Dorit Blickstein. Detection of methylated ABL1 promoter in philadelphia-negative myeloproliferative disorders. Blood Cells, Molecules, and Diseases, 30(1): , Baodong Sun, Guanchao Jiang, Muhammad-Ali A. Zaydan, Vincent F. La Russa, Hana Safah, and Melanie Ehrlich. ABL1 promoter methylation can exist independently of BCR-ABL transcription in chronic myeloid leukemia hematopoietic progenitors. Cancer Research, 61(18): , Jing Jin Gu, Clay Rouse, Xia Xu, Jun Wang, Mark W. Onaitis, and Ann Marie Pendergast. Inactivation of ABL kinases suppresses non-small cell lung cancer metastasis. JCI Insight, 1(21):1 16, Jean-Philippe Foy, Curtis R. Pickering, Vassiliki A. Papadimitrakopoulou, Jaroslav Jelinek, Steven H. Lin, and William N. William et al. New DNA methylation markers and global DNA hypomethylation are associated with oral cancer development. Cancer Prevention Research, 8(11): , Eun-Joon Lee, Prakash Rath, Jimei Liu, Dungsung Ryu, Lirong Pei, and Satish K. Noonepalle et al. Identification of global DNA methylation signatures in glioblastoma-derived cancer stem cells. Journal of Genetics and Genomics, 42(7): , Jean-Pierre Roperch, Karim Benzekri, Hicham Mansour, and Roberto Incitti. Improved amplification efficiency on stool samples by addition of 200

232 spermidine and its use for non-invasive detection of colorectal cancer. BMC Biotechnology, 15(1):41 49, Nadia Ashour, Javier C. Angulo, Guillermo Andres, Raul Alelu, Ana Gonzalez-Corpas, and Maria V. Toledo et al. A DNA hypermethylation profile reveals new potential biomarkers for prostate cancer diagnosis and prognosis. The Prostate, 74(12): , Bodour Salhia, Jeff Kiefer, Julianna T.D. Ross, Raghu Metapally, Rae Anne Martinez, and Kyle N. Johnson et al. Integrated genomic and epigenomic analysis of breast cancer brain metastasis. PLOS ONE, 9(1):e85448, Jean-Pierre Roperch, Roberto Incitti, Solene Forbin, Floriane Bard, Hicham Mansour, and Farida Mesli et al. Aberrant methylation of NPY, PENK, and WIF1 as a promising marker for blood-based diagnosis of colorectal cancer. BMC Cancer, 13(1): , Masahiro Shitani, Shigeru Sasaki, Noriyuki Akutsu, Hideyasu Takagi, Hiromu Suzuki, and Masanori Nojima et al. Genome-wide analysis of DNA methylation identifies novel cancer-related genes in hepatocellular carcinoma. Tumor Biology, 33(5): , Yugo Kishida, Atsushi Natsume, Yutaka Kondo, Ichiro Takeuchi, Byonggu An, and Yasuyuki Okamoto et al. Epigenetic subclassification of meningiomas based on genome-wide DNA methylation analyses. Carcinogenesis, 33(2): , Woonbok Chung, Jolanta Bondaruk, Jaroslav Jelinek, Yair Lotan, Shoudan Liang, Bogdan Czerniak, and Jean-Pierre J. Issa. Detection of bladder cancer using novel DNA methylation biomarkers in urine sediments. Cancer Epidemiology Biomarkers & Prevention, 20(7): , Ji Un Kang, Sun Hoe Koo, Kye Chul Kwon, Jong Woo Park, and Jin Man Kim. Gain at chromosomal region 5p15.33, containing TERT, is the most frequent genetic event in early stages of non-small cell lung cancer. Cancer Genetics and Cytogenetics, 182(1):1 11, Yunyu Chen, Jing Zhang, Dongsheng Li, Jiandong Jiang, Yanchang Wang, and Shuyi Si. Identification of a novel Polo-like kinase 1 inhibitor 201

233 that specifically blocks the functions of Polo-Box domain. 8(1): , Oncotarget, 465 Baochi Ou, Jingkun Zhao, Shaopei Guan, Xiongzhi Wangpu, Congcong Zhu, and Yaping Zong et al. PLK2 promotes tumor growth and inhibits apoptosis by targeting Fbxw7/Cyclin E in colorectal cancer. Cancer Letters, 380(2): , Fei Liu, Shimeng Zhang, Zhen Zhao, Xinru Mao, Jinlan Huang, and Zixian Wu et al. MicroRNA-27b up-regulated by human papillomavirus 16 E7 promotes proliferation and suppresses apoptosis by targeting polo-like kinase2 in cervical cancer. Oncotarget, 7(15): , Jia-Hui Xu, Shi-Lian Hu, Guo-Dong Shen, and Gan Shen. Tumor suppressor genes and their underlying interactions in paclitaxel resistance in cancer therapy. Cancer Cell International, 16(1):13 23, M.V. Ramana Reddy, Balireddy Akula, Shashidhar Jatiani, Rodrigo Vasquez-Del Carpio, Vinay K. Billa, and Muralidhar R. Mallireddigari et al. Discovery of 2-(1H-indol-5-ylamino)-6-(2,4-difluorophenylsulfonyl)-8-methylpyrido [2,3-d]pyrimidin-7(8H)-one (7ao) as a potent selective inhibitor of Polo like kinase 2 (PLK2). Bioorganic & Medicinal Chemistry, 24(4): , Zheng Bo Hu, Xiao Hong Liao, Zun Ying Xu, Xiao Yang, Chao Dong, An Min Jin, and Hai Lu. PLK2 phosphorylates and inhibits enriched TAp73 in human osteosarcoma cells. Cancer Medicine, 5(1):74 87, Li Ying Liu, Wei Wang, Ling Yu Zhao, Bo Guo, Juan Yang, and Xiao Ge Zhao et al. Silencing of polo-like kinase 2 increases cell proliferation and decreases apoptosis in SGC-7901 gastric cancer cells. Molecular Medicine Reports, 11(4): , Cheng-Wei Li and Bor-Sen Chen. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data. Cell Cycle, 15(19): ,

234 472 Vishal Kothari, Iris Wei, Sunita Shankar, Shanker Kalyana-Sundaram, Lidong Wang, and Linda W. Ma et al. Outlier kinase expression by RNA sequencing as targets for precision therapy. Cancer Discovery, 3(3): , Tobias Berg, Gesine Bug, Oliver G. Ottmann, and Klaus Strebhardt. Polo-like kinases in AML. 21(8): , Expert Opinion on Investigational Drugs, 474 Helen M. Coley, Eleftheria Hatzimichael, Sarah Blagden, Iain McNeish, Alastair Thompson, Tim Crook, and Nelofer Syed. Polo like kinase 2 tumour suppressor and cancer biomarker: new perspectives on drug sensitivity/resistance in ovarian cancer. Oncotarget, 3(1):78 83, Lalji K. Gediya, Aakanksha Khandelwal, Jyoti Patel, Aashvini Belosay, Gauri Sabnis, and Jhalak et al. Mehta. Design, synthesis, and evaluation of novel mutual prodrugs (hybrid drugs) of all-trans-retinoic acid and histone deacetylase inhibitors with enhanced anticancer activities in breast and prostate cancer cells in vitro. Journal of Medicinal Chemistry, 51(13): , Mon-Ju Wu, Mi Ra Kim, Yu-Shan Chen, Jun-Yi Yang, and Chia-Jung Chang. Retinoic acid directs breast cancer cell state changes through regulation of TET2-PKC pathway. Oncogene, 36(22): , Liyan Qu and Xiuwen Tang. Bexarotene: a promising anticancer agent. Cancer Chemotherapy and Pharmacology, 65(2): , Martin P. Powers, Wei-Lien Wang, Vivian S. Hernandez, Kayuri S. Patel, Dina C. Lev, Alexander J. Lazar, and Dolores H. Lopez-Terrada. Detection of myxoid liposarcoma-associated FUS-DDIT3 rearrangement variants including a newly identified breakpoint using an optimized RT-PCR assay. Modern Pathology, 23(10): , Carola Andersson, Henrik Fagman, Magnus Hansson, and Fredrik Enlund. Profiling of potential driver mutations in sarcomas by targeted next generation sequencing. Cancer Genetics, 209(4): ,

235 480 Yoshinao Oda, Hidetaka Yamamoto, Tomonari Takahira, Chikashi Kobayashi, Kenichi Kawaguchi, and Naomi Tateishi et al. Frequent alteration of p16ink4a/p14arf and p53 pathways in the round cell component of myxoid/round cell liposarcoma: p53 gene alterations and reduced p14arf expression both correlate with poor prognosis. The Journal of Pathology, 207(4): , Jordi Barretina, Barry S. Taylor, Shantanu Banerji, Alexis H. Ramos, Mariana Lagos-Quintana, and Penelope L. DeCarolis et al. Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nature Genetics, 42(8): , Elizabeth G. Demicco, Keila E. Torres, Markus P. Ghadimi, Chiara Colombo, Svetlana Bolshakov, and Aviad Hoffman et al. Involvement of the PI3K/Akt pathway in myxoid/round cell liposarcoma. Modern Pathology, 25(2): , Tsuyoshi Saito, Keisuke Akaike, Aiko Kurisaki-Arakawa, Midori Toda-Ishii, Kenta Mukaihara, and Yoshiyuki Suehara et al. TERT promoter mutations are rare in bone and soft tissue sarcomas of Japanese patients. Molecular and Clinical Oncology, 4(1):61 64, Christian Koelsche, Marcus Renner, Wolfgang Hartmann, Regine Brandt, Burkhard Lehner, and Nina Waldburger et al. TERT promoter hotspot mutations are recurrent in myxoid liposarcomas but rare in other soft tissue sarcoma entities. Journal of Experimental & Clinical Cancer Research, 33(1):33 40, Marieke A. de Graaff, Jamie S.E. Yu, Hannah C. Beird, Davis R. Ingram, Theresa Nguyen, and Jeffrey Juehui Liu et al. Establishment and characterization of a new human myxoid liposarcoma cell line (DL-221) with the FUS-DDIT3 translocation. Laboratory Investigation, 96(8): , Cristina R. Antonescu, Sylvia J. Tschernyavsky, Ramona Decuseara, Denis H. Leung, James M. Woodruff, and Murray F. Brennan et al. Prognostic impact of P53 status, TLS-CHOP fusion transcript structure, and histological grade in myxoid liposarcoma. Clinical Cancer Research, 7(12): ,

236 487 Aviad Hoffman, Markus P.H. Ghadimi, Elizabeth G. Demicco, Chad J. Creighton, Keila Torres, and Chiara Colombo et al. Localized and metastatic myxoid/round cell liposarcoma. Cancer, 119(10): , Christine G. Joseph, Heejung Hwang, Yuchen Jiao, Laura D. Wood, Isaac Kinde, and Jian Wu et al. Exomic analysis of myxoid liposarcomas, synovial sarcomas, and osteosarcomas. Genes, Chromosomes and Cancer, 53(1):15 24, Sarah Uboldi, Enrica Calura, Luca Beltrame, Ilaria Fuso Nerini, Sergio Marchini, and Duccio Cavalieri et al. A systems biology approach to characterize the regulatory networks leading to trabectedin resistance in an in vitro model of myxoid liposarcoma. PLOS ONE, 7(4):e35423, Walter Pavicic, Esa Perkio, Sippy Kaur, and Paivi Peltomaki. Altered methylation at microrna-associated CpG islands in hereditary and sporadic carcinomas: a methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA)-based approach. Molecular Medicine, 17(7-8): , Lina Albitar, Gavin Pickett, Marilee Morgan, Suzy Davies, and Kimberly K. Leslie. Models representing type I and type II human endometrial cancers: Ishikawa H and Hec50co cells. Gynecologic Oncology, 106(1):52 64, Karin Milde-Langosch, Christoph Goemann, Carola Methner, Gabriele Rieck, Ana-Maria Bamberger, and Thomas Loning. Expression of Rb2/p130 in breast and endometrial cancer: correlations with hormone receptor status. British Journal of Cancer, 85(4): , Amit Nahum, Keren Hirsch, Michael Danilenko, Colin K.W. Watts, Owen W.J. Prall, Joseph Levy, and Yoav Sharoni. Lycopene inhibition of cell cycle progression in breast and endometrial cancer cells is associated with reduction in cyclin D levels and retention of p27kip1 in the cyclin E-cdk2 complexes. Oncogene, 20(26):3428, Tommaso Susini, Daniela Massi, Milena Paglierani, Valeria Masciullo, Giovanni Scambia, and Antonio Giordano et al. Expression of the retinoblastoma-related gene Rb2/p130 is downregulated in atypical 205

237 endometrial hyperplasia and adenocarcinoma. 32(4): , Human Pathology, 495 Tommaso Susini, Feliciano Baldi, Candace M. Howard, Alfonso Baldi, Gianluigi Taddei, and Daniela Massi et al. Expression of the retinoblastoma-related gene Rb2/p130 correlates with clinical outcome in endometrial cancer. Journal of Clinical Oncology, 16(3): , Mina Massaro-Giordano, Gianluca Baldi, Antonio De Luca, Alfonso Baldi, and Antonio Giordano. Differential expression of the retinoblastoma gene family members in choroidal melanoma: prognostic significance. Clinical Cancer Research, 5(6):1455, Maria Pardo, Antonio Pineiro, Maria de la Fuente, Angel Garcia, Sripadi Prabhakar, and Nicole Zitzmann et al. Abnormal cell cycle regulation in primary human uveal melanoma cultures. Journal of Cellular Biochemistry, 93(4): , Vasily A. Yakovlev. Nitric oxide-dependent downregulation of BRCA1 expression promotes genetic instability. Cancer Research, 73(2):706, Cinti Caterina, Macaluso Marcella, and Antonio Giordano. Tumor-specific exon 1 mutations could be the hit event predisposing Rb2/p130 gene to epigenetic silencing in lung cancer. Oncogene, 24(38): , Hu Xue Jun, Akihiko Gemma, Yoko Hosoya, Kuniko Matsuda, Michiya Nara, and Yukio Hosomi et al. Reduced transcription of the RB2/p130 gene in human lung cancer. Molecular Carcinogenesis, 38(3): , Giuseppe Russo, Pier Paolo Claudio, Yan Fu, Peter Stiegler, Zailin Yu, Marcella Macaluso, and Antonio Giordano. prb2/p130 target genes in non-small lung cancer cells identified by microarray analysis. Oncogene, 22(44): , Sanjay Modi, Akihito Kubo, Herbert Oie, Amy B. Coxon, Ahad Rehmatulla, and Frederic J. Kaye. Protein expression of the RB-related gene family and SV40 large T antigen in mesothelioma and lung cancer. Oncogene, 19(40):4632,

238 503 Pier Paolo Claudio, Mario Caputi, and Antonio Giordano. The RB2/p130 gene: the latest weapon in the war against lung cancer? Research, 6(3):754, Clinical Cancer 504 Alfonso Baldi, Vincenzo Esposito, Antonio De Luca, Yan Fu, Ilernando Meoli, and Giovan G. Giordano et al. Differential expression of Rb2/p130 and p107 in normal human tissues and in primary lung cancer. Clinical Cancer Research, 3(10):1691, Luciano Mutti, Antonio De Luca, Pier Paolo Claudio, Giuseppe Convertino, Michele Carbone, and Antonio Giordano. Simian virus 40-like DNA sequences and large-t antigen-retinoblastoma family protein prb2/p130 interaction in human mesothelioma. Developments in Biological Standardization, 94:47 53, Kristian Helin, Karin Holm, Anita Niebuhr, Hans Eiberg, Niels Tommerup, and Susanne Hougaard et al. Loss of the retinoblastoma protein-related p130 protein in small cell lung carcinoma. Proceedings of the National Academy of Sciences of the United States of America, 94(13): , Steven G. Gray, Xiang Guo, Darek Kedra, Bin T. Teh, and Hua-Qing Min. Correspondence re: P.P. Claudio et al., Mutations in the Retinoblastoma-related Gene RB2/p130 in Primary Nasopharyngeal Carcinoma. Cancer Res., 60: 8-12, Cancer Research, 61(15): , Pier Paolo Claudio, Candace M. Howard, Alfonso Baldi, Antonio De Luca, Yan Fu, and Gianluigi Condorelli et al. p130/prb2 has growth suppressive properties similar to yet distinctive from those of retinoblastoma family members prb and p107. Cancer Research, 54(21):5556, Francesco P. Jori, Umberto Galderisi, Elena Piegari, Gianfranco Peluso, Marilena Cipollaro, and Antonio Cascino et al. RB2/p130 ectopic gene expression in neuroblastoma stem cells: evidence of cell-fate restriction and induction of differentiation. Biochemical Journal, 360(3):569, Giuseppe Raschella, Barbara Tanno, Francesco Bonetto, Roberto Amendola, Tullio Battista, and Antonio De Luca et al. Retinoblastoma-related 207

239 protein prb2/p130 and its binding to the B-myb promoter increase during human neuroblastoma differentiation. Journal of Cellular Biochemistry, 67(3): , Giuseppe Raschella, Barbara Tanno, Francesco Bonetto, Anna Negroni, Pier Paolo Claudio, and Alfonso Baldi et al. The RB-related gene Rb2/p130 in neuroblastoma differentiation and in B-myb promoter down-regulation. Cell Death and Differentiation, 5(5): , Riccardo Di Fiore, Antonella D Anneo, Giovanni Tesoriere, and Renza Vento. RB1 in cancer: different mechanisms of RB1 inactivation and alterations of prb pathway in tumorigenesis. Journal of Cellular Physiology, 228(8): , Iva Simeonova, Vincent Lejour, Boris Bardot, Rachida Bouarich-Bourimi, Aurelie Morin, and Ming Fang et al. Fuzzy tandem repeats containing p53 response elements may define species-specific p53 target genes. PLOS Genetics, 8(6):e , Zena Lim and Boon Long Quah. Unilateral retinoblastoma in an eye with Peters anomaly. Journal of American Association for Pediatric Ophthalmology and Strabismus, 14(2): , Paola Indovina, Antonio Acquaviva, Giulia De Falco, Valeria Rizzo, Anna Onnis, and Anna Luzzi et al. Downregulation and aberrant promoter methylation of p16ink4a: a possible novel heritable susceptibility marker to retinoblastoma. Journal of Cellular Physiology, 223(1): , Peh-Yean Cheah. The emerging role of RBL2/p130 in multi-step retinoblastoma tumorigenesis. Cancer Biology & Therapy, 8(8): , Kadam Priya, Srinivasa Rao Jada, Boon Long Quah, Thuan Chong Quah, and Poh San Lai. High incidence of allelic loss at 16q12. 2 region spanning RB2/p130 gene in retinoblastoma. Cancer Biology & Therapy, 8(8): , David MacPherson, Karina Conkrite, Mandy Tam, Shizuo Mukai, David Mu, and Tyler Jacks. Murine bilateral retinoblastoma exhibiting rapid-onset, 208

240 metastatic progression and N-myc gene amplification. The EMBO Journal, 26(3): , Gian Marco Tosi, Carmela Trimarchi, Marcella Macaluso, Dario La Sala, Alfredo Ciccodicola, and Stefano Lazzi et al. Genetic and epigenetic alterations of RB2/p130 tumor suppressor gene in human sporadic retinoblastoma: implications for pathogenesis and therapeutic approach. Oncogene, 24(38): , David MacPherson, Julien Sage, Teresa Kim, Dennis Ho, Margaret E. McLaughlin, and Tyler Jacks. Cell type-specific effects of Rb deletion in the murine retina. Genes & Development, 18(14): , Marie Classon and Ed Harlow. The retinoblastoma tumour suppressor in development and cancer. Nature Reviews Cancer, 2(12): , Cristiana Bellan, Giulia De Falco, Gian Marco Tosi, Stefano Lazzi, Filomena Ferrari, and Giovanna Morbini et al. Missing expression of prb2/p130 in human retinoblastomas is associated with reduced apoptosis and lesser differentiation. Investigative Ophthalmology & Visual Science, 43(12): , William R. Sellers, Bennett G. Novitch, Satoshi Miyake, Agnieszka Heith, Gregory A. Otterson, and Frederic J. Kaye et al. Stable binding to E2F is not required for the retinoblastoma protein to activate transcription, promote differentiation, and suppress tumor cell growth. Genes & Development, 12(1):95 106, Yukiharu Sawada, Hajime Nomura, Yuichi Endo, Kazumi Umeki, Teizo Fujita, Sachiya Ohtaki, and Kei Fujinaga. Cloning and characterization of the rat p130, a member of the retinoblastoma gene family. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease, 1361(1):20 27, Alfonso Baldi, Vincenzo Boccia, Pier Paolo Claudio, Antonio De Luca, and Antonio Giordano. Genomic structure of the human retinoblastoma-related Rb2/p130 gene. Proceedings of the National Academy of Sciences of the United States of America, 93(10): ,

241 526 Jacqueline M. Sterner, Yunxia Tao, Sarah B. Kennett, Hyung G. Kim, and Jonathan M. Horowitz. The amino terminus of the retinoblastoma (rb) protein associates with a cyclin-dependent kinase-like kinase via rb amino acids required for growth suppression. Cell Growth & Differentiation, 7(1):53 64, Peter Whyte. The retinoblastoma protein and its relatives. Seminars in Cancer Biology, 6(2):83 90, Hugh Cam and Brian David Dynlacht. Emerging roles for E2F: beyond the G1/S transition and DNA replication. Cancer Cell, 3(4): , Jacob B. Hansen, Hein te Riele, and Karsten Kristiansen. Novel function of the retinoblastoma protein in fat: regulation of white versus brown adipocyte differentiation. Cell Cycle, 3(6): , Victoria M. Richon, Robert E. Lyle, and Robert E. McGehee. Regulation and expression of retinoblastoma proteins p107 and p130 during 3t3-l1 adipocyte differentiation. Journal of Biological Chemistry, 272(15): , Stefania Capasso, Nicola Alessio, Giovanni Di Bernardo, Marilena Cipollaro, Mariarosa Melone, and Gianfranco Peluso et al. Silencing of RB1 and RB2/P130 during adipogenesis of bone marrow stromal cells results in dysregulated differentiation. Cell Cycle, 13(3): , Mark F. Pittenger, Alastair M. Mackay, Stephen C. Beck, Rama K. Jaiswal, Robin Douglas, and Joseph D. Mosca et al. Multilineage potential of adult human mesenchymal stem cells. Science, 284(5411): , Alexander B. Mohseny, Karoly Szuhai, Salvatore Romeo, Emilie P. Buddingh, Inge Briaire-de Bruijn, and Danielle de Jong et al. Osteosarcoma originates from mesenchymal stem cells in consequence of aneuploidization and genomic loss of Cdkn2. The Journal of Pathology, 219(3): , Nedime Serakinci, Per Guldberg, Jorge S. Burns, Basem Abdallah, Henrik Schrodder, Thomas Jensen, and Moustapha Kassem. Adult human mesenchymal stem cell as a target for neoplastic transformation. Oncogene, 23(29): ,

242 535 Ioannis Panagopoulos, M. Hoglund, Fredrik Mertens, Nils Mandahl, Felix Mitelman, and Pierre Aman. Fusion of the EWS and CHOP genes in myxoid liposarcoma. Oncogene, 12(3): , Helene Zinszner, John Sok, David Immanuel, Yin Yin, and David Ron. TLS (FUS) binds RNA in vivo and engages in nucleo-cytoplasmic shuttling. Journal of Cell Science, 110(15):1741, Jessica I. Hoell, Erik Larsson, Simon Runge, Jeffrey D. Nusbaum, Sujitha Duggimpudi, and Thalia A. Farazi et al. RNA targets of wild-type and mutant FET family proteins. Nature Structural & Molecular Biology, 18(12): , Adelene Y. Tan and James L. Manley. The TET family of proteins: Functions and roles in disease. Journal of Molecular Cell Biology, 1(2):82 92, Nicola D. Roberts, R. Daniel Kortschak, Wendy T. Parker, Andreas W. Schreiber, Susan Branford, and Hamish S. Scott et al. A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics, 29(18): , Anne Bruun Kroigard, Mads Thomassen, Anne-Vibeke Laenkholm, Torben A. Kruse, and Martin Jakob Larsen. Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data. PLOS ONE, 11(3):e , Li Ding, Michael C. Wendl, Daniel C. Koboldt, and Elaine R. Mardis. Analysis of next generation genomic data in cancer: accomplishments and challenges. Human Molecular Genetics, 19(R2): , Michael Gundry and Jan Vijg. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 729(1):1 15, Ensel Oh, Yoon-La Choi, Mi Jeong Kwon, Ryong Nam Kim, Yu Jin Kim, and Ji-Young Song et al. Comparison of accuracy of whole-exome sequencing 211

243 with formalin-fixed paraffin-embedded and fresh frozen tissue samples. PLOS ONE, 10(12):e , Jan A. Sikorsky, Donald A. Primerano, Terry W. Fenger, and James Denvir. DNA damage reduces Taq DNA polymerase fidelity and PCR amplification efficiency. Biochemical and Biophysical Research Communications, 355(2): , Hongdo Do and Alexander Dobrovic. Dramatic reduction of sequence artefacts from DNA isolated from formalin-fixed cancer biopsies by treatment with uracil-dna glycosylase. Oncotarget, 3(5): , Hongdo Do, Stephen Q. Wong, Jason Li, and Alexander Dobrovic. Reducing sequence artifacts in amplicon-based massively parallel sequencing of formalin-fixed paraffin-embedded DNA by enzymatic depletion of uracil-containing templates. Clinical Chemistry, 59(9): , Michael Hofreiter, Viviane Jaenicke, David Serre, Arndt von Haeseler, and Svante Paabo. DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Research, 29(23): , Juliane C. Dohm, Claudio Lottaz, Tatiana Borodina, and Heinz Himmelbauer. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research, 36(16):e105 e105, Frazer Meacham, Dario Boffelli, Joseph Dhahbi, David I.K. Martin, Meromit Singer, and Lior Pachter. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics, 12(1):451, Kensuke Nakamura, Taku Oshima, Takuya Morimoto, Shun Ikeda, Hirofumi Yoshikawa, and Yuh Shiwa et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research, pages 1 13, Kym M. Boycott, Megan R. Vanstone, Dennis E. Bulman, and Alex E. MacKenzie. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nature Review Genetics, 14(10): ,

244 552 Gregory M. Cooper and Jay Shendure. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Genetics, 12(9): , Nature Reviews 553 Shamil R. Sunyaev. Inferring causality and functional significance of human coding DNA variants. Human Molecular Genetics, 21(R1):R10 R17, Matthew Zawistowski, Shyam Gopalakrishnan, Jun Ding, Yun Li, Sara Grimm, and Sebastian Zollner. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. American Journal of Human Genetics, 87(5): , Martin Ladouceur, Zari Dastani, Yurii S. Aulchenko, Celia M.T. Greenwood, and J. Brent Richards. The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals. PLOS Genetics, 8(2):e , Seunggeung Lee, Goncalo R. Abecasis, Michael Boehnke, and Xihong Lin. Rare-variant association analysis: study designs and statistical tests. American Journal of Human Genetics, 95(1):5 23, Stephan Morgenthaler and William G. Thilly. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (CAST). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 615(1-2):28 56, Bingshan Li and Suzanne M. Leal. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics, 83(3): , Andrew S. Brohl, Rajesh Patidar, Clesson E. Turner, Xinyu Wen, Young K. Song, and Jun S. Wei et al. Frequent inactivating germline mutations in DNA repair genes in patients with Ewing sarcoma. Genetic in Medicine, Garvan Institute of Medical Research. Medical genome reference bank; URL: clinical-genomics/sydney-genomics-collaborative/mgrb,

245 561 John J. McNeil, Robyn L. Woods, Mark R. Nelson, Anne M. Murray, Christopher M. Reid, and Brenda Kirpach et al. Baseline characteristics of participants in the ASPREE (ASPirin in Reducing Events in the Elderly) study. The Journals of Gerontology, Goo Jun, Matthew Flickinger, Kurt N. Hetrick, Jane M. Romm, Kimberly F. Doheny, and Goncalo R. Abecasis et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. The American Journal of Human Genetics, 91(5): , Consortium The Genomes Project. A global reference for human genetic variation. Nature, 526(7571):68 74, GATK Documentation. Best practices for germline SNP & Indel discovery in whole genome and exome sequence; URL: GermShortWGS, Donna Karolchik, Robert Baertsch, Mark Diekhans, Terrence S. Furey, Angie Hinrichs, and Y.T. Lu et al. The UCSC genome browser database. Nucleic Acids Research, 31(1):51 54, Florian Gnad, Albion Baucom, Kiran Mukhyala, Gerard Manning, and Zemin Zhang. Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics, 14(3):S7, NCBI. EST Profile Hs C16orf96: Chromosome 16 open reading frame 96; URL: Li Liu, Jiao Huang, Ke Wang, Li Li, Yangkai Li, Jingsong Yuan, and Sheng Wei. Identification of hallmarks of lung adenocarcinoma prognosis using whole genome sequencing. Oncotarget, 6(35): ,

246 569 Desheng Xiao, Ying Shi, Chunyan Fu, Jiantao Jia, Yu Pan, and Yiqun Jiang et al. Decrease of TET2 expression and increase of 5-hmC levels in myeloid sarcomas. Leukemia Research, 42:75 79, Yu Pan, Yongguang Tao, Chunyan Fu, Jiantao Jia, Shuang Liu, and Desheng Xiao. Assessment of PET/CT in multifocal myeloid sarcomas with loss of TET2: a case report and literature review. International Journal of Clinical and Experimental Pathology, 8(10): , Pamela J. Woodring, Tony Hunter, and Jean Y.J. Wang. Regulation of F-actin-dependent processes by the Abl family of tyrosine kinases. Journal of Cell Science, 116(13): , Emma Shtivelman, Batia Lifshitz, Robert P. Gale, and Eli Canaani. Fused transcript of ABL and BCR genes in chronic myelogenous leukaemia. Nature, 216: , Richard B. Jones, Andrew Gordus, Jordan A. Krall, and Gavin MacBeath. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature, 439(7073): , Divyamani Srinivasan and Rina Plattner. Activation of Abl tyrosine kinases promotes invasion of aggressive breast cancer cells. 66(11): , Cancer Research, 575 Liuqing Yang, Chunru Lin, and Zhi-Ren Liu. P68 RNA helicase mediates PDGF-induced epithelial mesenchymal transition by displacing Axin from B-catenin. Cell, 127(1): , Klarisa Rikova, Ailan Guo, Qingfu Zeng, Anthony Possemato, Jian Yu, and Herbert Haack et al. Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer. Cell, 131(6): , Jeffrey Lin, Tong Sun, Lin Ji, Wei Deng, Jack Roth, John D. Minna, and Ralph Arlinghaus. Oncogenic activation of c-abl in non-small cell lung cancer cells lacking FUS1 expression: inhibition of c-abl by the tumor suppressor gene product Fus1. Oncogene, 26(49): ,

247 578 Chang-Jiun Wu, Tianxi Cai, Klarisa Rikova, David Merberg, Simon Kasif, and Martin Steffen. A predictive phosphorylation signature of lung cancer. PLOS ONE, 4(11):e7994, Julian Carretero, Takeshi Shimamura, Klarisa Rikova, Autumn L. Jackson, Matthew D. Wilkerson, and Christa L. Borgman et al. Integrative genomic and proteomic analyses identify targets for Lkb1-deficient metastatic lung tumors. Cancer Cell, 17(6): , Justin M. Drake, Nicholas A. Graham, Tanya Stoyanova, Amir Sedghi, Andrew S. Goldstein, and Houjian Cai et al. Oncogene-specific activation of tyrosine kinase networks during prostate cancer progression. Proceedings of the National Academy of Sciences, 109(5): , Alessandro Furlan, Venturina Stagni, Azeemudeen Hussain, Sylvie Richelme, Filippo Conti, and Andrea Prodosmo et al. Abl interconnects oncogenic Met and p53 core pathways in cancer cells. Cell Death & Differentiation, 18(10): , Sourik S. Ganguly, Leann S. Fiore, Jonathan T. Sims, J. Woodrow Friend, Divyamani Srinivasan, and Matthew A. Thacker et al. c-abl and Arg are activated in human primary melanomas, promote melanoma cell invasion via distinct pathways, and drive metastatic progression. Oncogene, 31(14): , Junaid Ansari, Abdul Rafeh Naqash, Reinhold Munker, Hazem El-Osta, Samip Master, and James D. Cotelingam et al. Histiocytic sarcoma as a secondary malignancy: pathobiology, diagnosis, and treatment. European Journal of Haematology, 97(1):9 16, Xueyan Chen, Joe C. Rutledge, David Wu, Min Fang, Kent E. Opheim, and Min Xu. Chronic myelogenous leukemia presenting in blast phase with nodal, bilineal myeloid sarcoma and t-lymphoblastic lymphoma in a child. Pediatric and Developmental Pathology, 16(2):91 96, Brian L. Samuels, Sant Chawla, Shreyaskumar Patel, Margaret von Mehren, Jeremy Hamm, and Pamela E. Kaiser et al. Clinical outcomes and safety with trabectedin therapy in patients with advanced soft tissue sarcomas 216

248 following failure of prior chemotherapy: results of a worldwide expanded access program study. Annals of Oncology, 24(6): , Fernando A. Angarita, Amanda J. Cannell, Albiruni R. Abdul Razak, Brendan C. Dickson, and Martin E. Blackstein. Trabectedin for inoperable or recurrent soft tissue sarcoma in adult patients: a retrospective cohort study. BMC Cancer, 16(1):30 41, Kira Bramswig, Ferdinand Ploner, Alexandra Martel, Thomas Bauernhofer, Wolfgang Hilbe, and Thomas Kuhr et al. Sorafenib in advanced, heavily pretreated patients with soft tissue sarcomas. Anti-Cancer Drugs, 25(7): , Armando Santoro, Alessandro Comandone, Umberto Basso, Hector Soto Parra, Rita De Sanctis, and Elisa Stroppa et al. Phase II prospective study with sorafenib in advanced soft tissue sarcomas after anthracycline-based therapy. Annals of Oncology, 24(4): , Bo Eskerod Madsen and Sharon R. Browning. A groupwise association test for rare mutations using a weighted sum statistic. 5(2):e , PLOS Genetics, 590 Ya-Jing Zhou, Yong Wang, and Li-Li Chen. Detecting the common and individual effects of rare variants on quantitative traits by using extreme phenotype sampling. Genes, 7(1):2 14, Benjamin M. Neale, Manuel A. Rivas, Benjamin F. Voight, David Altshuler, Bernie Devlin, and Marju Orho-Melander et al. Testing for an unusual distribution of rare variants. PLOS Genetics, 7(3):e , Seunggeun Lee, Michael C. Wu, and Xihong Lin. Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13(4): , Andriy Derkach, Jerry F. Lawless, and Lei Sun. Robust and powerful tests for rare variants using Fisher s method to combine evidence of association from two or more complementary tests. Genetic Epidemiology, 37(1): ,

249 594 Satu Maki-Nevala, Virinder Kaur Sarhadi, Aija Knuuttila, Ilari Scheinin, Pekka Ellonen, and Sonja Lagstrom et al. Driver gene and novel mutations in asbestos-exposed lung adenocarcinoma and malignant mesothelioma detected by exome sequencing. Lung, 194(1): , Robbert D.A. Weren, Marjolijn J.L. Ligtenberg, C. Marleen Kets, Richarda M. de Voer, Eugene T.P. Verwiel, and Liesbeth Spruijt et al. A germline homozygous mutation in the base-excision repair gene NTHL1 causes adenomatous polyposis and colorectal cancer. Nature Genetics, 47(6): , Oriol Calvete, Jose Reyes, Sheila Zuniga, Beatriz Paumard-Hernandez, Victoria Fernandez, and Luis Bujanda et al. Exome sequencing identifies ATP4A gene as responsible of an atypical familial type I gastric neuroendocrine tumour. Human Molecular Genetics, 24(10): , Michael W. Ronellenfitsch, Oh Ji Eun, Kaishi Satomi, Koichiro Sumi, Patrick N. Harter, and Joachim P. Steinbach et al. CASP9 germline mutation in a family with multiple brain tumors. Brain Pathology, pages 1 22, Cezary Cybulski, Jian Carrot-Zhang, Wojciech Kluzniak, Barbara Rivera, Aniruddh Kashyap, and Dominika Wokolorczyk et al. Germline RECQL mutations are associated with breast cancer susceptibility. Nature Genetics, 47(6): , Jie Sun, Yuxia Wang, Yisui Xia, Ye Xu, Tao Ouyang, and Jinfeng Li et al. Mutations in RECQL gene are associated with predisposition to breast cancer. PLOS Genetics, 11(5):e , Johanna I. Kiiski, Liisa M. Pelttari, Sofia Khan, Edda S. Freysteinsdottir, Inga Reynisdottir, and Steven N. Hart et al. Exome sequencing identifies FANCM as a susceptibility gene for triple-negative breast cancer. Proceedings of the National Academy of Sciences, 111(42): , Francisco Javier Gracia-Aznarez, Victoria Fernandez, Guillermo Pita, Paolo Peterlongo, Orlando Dominguez, and Miguel de la Hoya et al. Whole exome sequencing suggests much of non-brca1/brca2 familial breast cancer 218

250 is due to moderate and low penetrance susceptibility alleles. PLOS ONE, 8(2):e55681, Paolo Peterlongo, Irene Catucci, Mara Colombo, Laura Caleca, Eliseos Mucaki, and Massimo Bogliolo et al. FANCM c.5791c>t nonsense mutation (rs ) induces exon skipping, affects DNA repair activity and is a familial breast cancer risk factor. Human Molecular Genetics, 24(18): , Daniel J. Park, Kayoko Tao, Florence Le Calvez-Kelm, Tu Nguyen-Dumont, Nivonirina Robinot, and Fleur Hammet et al. Rare mutations in RINT1 predispose carriers to breast and Lynch syndrome-spectrum cancers. Cancer Discovery, 4(7): , Ella R. Thompson, Maria A. Doyle, Georgina L. Ryland, Simone M. Rowley, David Y. H. Choong, and Richard W. Tothill et al. Exome sequencing identifies rare deleterious mutations in DNA repair genes FANCC and BLM as potential breast cancer susceptibility alleles. PLOS Genetics, 8(9):e , Anna P. Sokolenko, Aglaya G. Iyevleva, Elena V. Preobrazhenskaya, Nathalia V. Mitiushkina, Svetlana N. Abysheva, and Evgeny N. Suspitsin et al. High prevalence and breast cancer predisposing role of the BLM c C> T (Q548X) mutation in Russia. International Journal of Cancer, 130(12): , Darya Prokofyeva, Natalia Bogdanova, Natalia Dubrowinskaja, Marina Bermisheva, Zalina Takhirova, and Natalia Antonenkova et al. Nonsense mutation p.q548x in BLM, the gene mutated in Bloom s syndrome, is associated with breast cancer in Slavic populations. Breast Cancer Research and Treatment, 137(2): , Marianne Berwick, Jaya M. Satagopan, Leah Ben-Porat, Ann Carlson, Katherine Mah, and Rashida Henry et al. Genetic heterogeneity among fanconi anemia heterozygotes and risk of cancer. Cancer Research, 67(19): ,

251 608 Daniel J. Park, Fabienne Lesueur, Tu Nguyen-Dumont, Maroulio Pertesi, Fabrice Odefrey, and F. Hammet et al. Rare mutations in XRCC2 increase the risk of breast cancer. The American Journal of Human Genetics, 90(4): , Kirsi Maatta, Tommi Rantapero, Anna Lindstrom, Matti Nykter, Minna Kankuri-Tammilehto, Satu-Leena Laasanen, and Johanna Schleutker. Whole-exome sequencing of Finnish hereditary breast cancer families. European Journal of Human Genetics, Abdelkader Heddar, Pierre Fermey, Sophie Coutant, Emilie Angot, Jean-Christophe Sabourin, and Paul Michelin et al. Familial solitary chondrosarcoma resulting from germline EXT2 mutation. Genes, Chromosomes and Cancer, Lynn R. Goldin, Mary L. McMaster, Melissa Rotunno, Sarah E.M. Herman, Kristine Jones, and Bin Zhu et al. Whole exome sequencing in families with CLL detects a variant in Integrin B 2 associated with disease susceptibility. Blood, 128(18): , Helen E. Speedy, Ben Kinnersley, Daniel Chubb, Peter Broderick, Philip J. Law, and Kevin Litchfield et al. Germline mutations in shelterin complex genes are associated with familial chronic lymphocytic leukemia. Blood, Jun-Xiao Zhang, Lei Fu, Richarda M. de Voer, Marc-Manuel Hahn, Peng Jin, and Chen-Xi Lv et al. Candidate colorectal cancer predisposing gene variants in Chinese early-onset and familial cases. World Journal of Gastroenterology, 21(14): , Nuria Segui, Leonardo B. Mina, Conxi Lazaro, Rebeca Sanz-Pamplona, Tirso Pons, and Matilde Navarro et al. Germline mutations in FAN1 cause hereditary colorectal cancer by impairing DNA repair. Gastroenterology, 149(3): , Clara Esteban-Jurado, Maria Vila-Casadesus, Pilar Garre, Juan Jose Lozano, Anna Pristoupilova, and Sergi Beltran et al. Whole-exome sequencing identifies rare pathogenic variants in new predisposition genes for familial colorectal cancer. Genetics in Medicine,

252 616 Taina T. Nieminen, Marie-Francoise O Donohue, Yunpeng Wu, Hannes Lohi, Stephen W. Scherer, and Andrew D. Paterson et al. Germline mutation of RPS20, encoding a ribosomal protein, causes predisposition to hereditary nonpolyposis colorectal carcinoma without DNA mismatch repair deficiency. Gastroenterology, 147(3): e5, Alexandra E. Gylfe, Riku Katainen, Johanna Kondelin, Tomas Tanskanen, Tatiana Cajuso, and Ulrika Hanninen et al. Eleven candidate susceptibility genes for common familial colorectal cancer. PLOS Genetics, 9(10):e , Pi-Yueh Chang, Jinn-Shiun Chen, Nai-Chung Chang, Shih-Cheng Chang, Mei-Chia Wang, and Shu-Hui Tsai et al. NRAS germline variant G138R and multiple rare somatic mutations on APC in colorectal cancer patients in Taiwan by next generation sequencing. Oncotarget, 7(25): , Daniel Chubb, Peter Broderick, Sara E. Dobbins, Matthew Framptom, Ben Kinnersley, and Steven Penegar et al. Rare disruptive mutations and their contribution to the heritable risk of colorectal cancer. Nature Communications, Richarda M. de Voer, Marc-Manuel Hahn, Robbert D.A. Weren, Arjen R. Mensenkamp, Christian Gilissen, and Wendy A. van Zelst-Stams et al. Identification of novel candidate genes for early-onset colorectal cancer susceptibility. PLOS Genetics, 12(2):e , Claire Palles, Jean-Baptiste Cazier, Kimberley M. Howarth, Enric Domingo, Angela M. Jones, and Peter Broderick et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature Genetics, 45(2): , Christopher G. Smith, Marc Naven, Rebecca Harris, James Colley, Hannah West, and Ning Li et al. Exome resequencing identifies potential tumor-suppressor genes that predispose to colorectal cancer. Human Mutation, 34(7): ,

253 623 Anna Rohlin, Theofanis Zagoras, Staffan Nilsson, Ulf Lundstam, Jan Wahlstrom, and Leif Hulten et al. A mutation in POLE predisposing to a multi-tumour phenotype. International Journal of Oncology, 45(1):77 81, Laura Valle, Eva Hernandez-Illan, Fernando Bellido, Gemma Aiza, Adela Castillejo, and Maria-Isabel Castillejo et al. New insights into POLE and POLD1 germline mutations in familial colorectal cancer and polyposis. Human Molecular Genetics, 23(13): , Fernando Bellido, Marta Pineda, Gemma Aiza, Rafael Valdes-Mas, Matilde Navarro, and Diana A. Puente et al. POLE and POLD1 mutations in 529 kindred with familial colorectal cancer and/or polyposis: review of reported cases and recommendations for genetic testing and surveillance. Genetics in Medicine, 18(4): , Daniel Chubb, Peter Broderick, Matthew Frampton, Ben Kinnersley, Amy Sherborne, and Steven Penegar et al. Genetic diagnosis of high-penetrance susceptibility for colorectal cancer (CRC) is achievable for a high proportion of familial CRC by exome sequencing. Journal of Clinical Oncology, 33(5): , Fadwa A. Elsayed, C. Marleen Kets, Dina Ruano, Brendy van den Akker, Arjen R. Mensenkamp, and Melanie Schrumpf et al. Germline variants in POLE are associated with early onset mismatch repair deficient colorectal cancer. European Journal of Human Genetics, 23(8): , Maren F. Hansen, Jostein Johansen, Inga Bjornevoll, Anna E. Sylvander, Kristin S. Steinsbekk, and Pal Saetrom et al. A novel POLE mutation associated with cancers of colon, pancreas, ovaries and small intestine. Familial Cancer, 14(3): , Isabel Spier, Stefanie Holzapfel, Janine Altmuller, Bixiao Zhao, Sukanya Horpaopan, and Stefanie Vogt et al. Frequency and phenotypic spectrum of germline mutations in POLE and seven other polymerase genes in 266 patients with colorectal adenomas and carcinomas. International Journal of Cancer, 137(2): ,

254 630 Yael Goldberg, Naama Halpern, Ayala Hubert, Samuel N. Adler, Sherri Cohen, and Morasha Plesser-Duvdevani et al. Mutated MCM9 is associated with predisposition to hereditary mixed polyposis and colorectal cancer in addition to primary ovarian failure. Cancer Genetics, 208(12): , Ronja Adam, Isabel Spier, Bixiao Zhao, Michael Kloth, Jonathan Marquez, and Inga Hinrichsen et al. Exome sequencing identifies biallelic MSH3 germline mutations as a recessive subtype of colorectal adenomatous polyposis. The American Journal of Human Genetics, 99(2): , Isabel Spier, Martin Kerick, Dmitriy Drichel, Sukanya Horpaopan, Janine Altmuller, and Andreas Laner et al. Exome sequencing identifies potential novel candidate genes in patients with unexplained colorectal adenomatous polyposis. Familial Cancer, 15(2): , Ryan E. Fecteau, Jianping Kong, Adam Kresak, Wendy Brock, Yeunjoo Song, and Hisashi Fujioka et al. Association between germline mutation in VSIG10L and familial Barrett neoplasia. JAMA Oncology, 2(10): , Caixia Cheng, Heyang Cui, Ling Zhang, Zhiwu Jia, Bin Song, and Fang Wang et al. Genomic analyses reveal FAM84B and the NOTCH pathway are associated with the progression of esophageal squamous cell carcinoma. GigaScience, 5(1):1, Keqiang Zhang, Jia-Wei Lin, Jinhui Wang, Xiwei Wu, Hanlin Gao, and Yi-Chen Hsieh et al. A germline missense mutation in COQ6 is associated with susceptibility to familial schwannomatosis. Genetic Medicine, 16(10): , Iikki Donner, Tuula Kiviluoto, Ari Ristimaki, Lauri A. Aaltonen, and Pia Vahteristo. Exome sequencing reveals three novel candidate predisposition genes for diffuse gastric cancer. Familial Cancer, 14(2): , Ian J. Majewski, Irma Kluijt, Annemieke Cats, Thomas S. Scerri, Daphne de Jong, and Roelof J.C. Kluin et al. An a-e-catenin (CTNNA1) mutation in hereditary diffuse gastric cancer. The Journal of Pathology, 229(4): ,

255 638 Samantha Hansford, Pardeep Kaurah, Hector Li-Chang, Michelle Woo, Janine Senz, and Hugo Pinheiro et al. Hereditary diffuse gastric cancer syndrome: CDH1 mutations and beyond. JAMA Oncology, 1(1):23 32, Matthew N. Bainbridge, Georgina N. Armstrong, M. Monica Gramatges, Alison A. Bertuch, Shalini N. Jhangiani, and Harsha Doddapaneni et al. Germline mutations in shelterin complex genes are associated with familial glioma. Journal of the National Cancer Institute, 107(1):dju384, Heikki Ristolainen, Outi Kilpivaara, Peter Kamper, Minna Taskinen, Silva Saarinen, and Sirpa Leppa et al. Identification of homozygous deletion in ACAN and other candidate variants in familial classical Hodgkin lymphoma by exome sequencing. British Journal of Haematology, 170(3): , Silva Saarinen, Mervi Aavikko, Kristiina Aittomaki, Virpi Launonen, Rainer Lehtonen, and Kaarle Franssila et al. Exome sequencing reveals germline NPAT mutation as a candidate risk factor for Hodgkin lymphoma. Blood, 118(3): , Melissa Rotunno, Mary L. McMaster, Joseph Boland, Sara Bass, Xijun Zhang, and Laurie Burdett et al. Whole exome sequencing in families at high risk for Hodgkin lymphoma: identification of a predisposing mutation in the KDR gene. Haematologica, 101(7):853, Natalia D. Linhares, Maira C.M. Freire, Raony G.C.C.L. Cardenas, Heloisa B. Pena, Magda Bahia, and Sergio D.J. Pena. Exome sequencing identifies a novel homozygous variant in NDRG4 in a family with infantile myofibromatosis. European Journal of Medical Genetics, 57(11-12): , Yee Him Cheung, Tenzin Gayden, Philippe M. Campeau, Charles A. LeDuc, Donna Russo, and Van-Hung Nguyen et al. A recurrent PDGFRB mutation causes familial infantile myofibromatosis. The American Journal of Human Genetics, 92(6): , John A. Martignetti, Lifeng Tian, Dong Li, Maria Celeste M. Ramirez, Olga Camacho-Vanegas, and Sandra Catalina Camacho et al. Mutations 224

256 in PDGFRB cause autosomal-dominant infantile myofibromatosis. American Journal of Human Genetics, 92(6): , The 646 Xiaolei Lan, Hua Gao, Fei Wang, Jie Feng, Jiwei Bai, and Peng Zhao et al. Whole-exome sequencing identifies variants in invasive pituitary adenomas. Oncology Letters, 12(4): , Joanne Ngeow, Wanfeng Yu, Lamis Yehia, Farshad Niazi, Jinlian Chen, and Xuhua Tang et al. Exome sequencing reveals germline SMAD9 mutation that reduces phosphatase and tensin homolog expression and is associated with hamartomatous polyposis and gastrointestinal ganglioneuromas. Gastroenterology, 149(4): , Mervi Aavikko, Eevi Kaasinen, Janne K. Nieminen, Minji Byun, Iikki Donner, and Roberta Mancuso et al. Whole-genome sequencing identifies STAT4 as a putative susceptibility gene in classic Kaposi sarcoma. Journal of Infectious Diseases, 211(11): , Sho Egashira, Masatoshi Jinnin, Miho Harada, Shinichi Masuguchi, Satoshi Fukushima, and Hironobu Ihn. Exome sequence analysis of Kaposiform hemangioendothelioma: identification of putative driver mutations. Anais Brasileiros de Dermatologia, 91(6): , Stefano Caruso, Julien Calderaro, Eric Letouze, Jean-Charles Nault, Gabrielle Couchy, and Anais Boulais et al. Germline and somatic DICER1 mutations in familial and sporadic liver tumors. Journal of Hepatology, 66(4): , Donghai Xiong, Yian Wang, Elena Kupert, Claire Simpson, Susan M. Pinney, and Colette R. Gaba et al. A recurrent mutation in PARK2 is associated with familial lung cancer. The American Journal of Human Genetics, 96(2): , Hsuan-Yu Chen, Sung-Liang Yu, Bing-Ching Ho, Kang-Yi Su, Yi-Chiung Hsu, and Chi-Sheng Chang et al. R331W missense mutation of oncogene YAP1 is a germline risk allele for lung adenocarcinoma with medical actionability. Journal of Clinical Oncology, 33(20): ,

257 653 Makia J. Marafie, Mohammed Dashti, and Fahd Al-Mulla. Identification of a rare germline NBN gene mutation by whole exome sequencing in a lung-cancer survivor from a large family with various types of cancer. Familial Cancer, pages 1 6, Leila Noetzli, Richard W. Lo, Alisa B. Lee-Sherick, Michael Callaghan, Patrizia Noris, and Anna Savoia et al. Germline mutations in ETV6 are associated with thrombocytopenia, red cell macrocytosis and predisposition to lymphoblastic leukemia. Nature Genetics, 47(5): , Sabine Topka, Joseph Vijai, Michael F. Walsh, Lauren Jacobs, Ann Maria, and Danylo Villano et al. Germline ETV6 mutations confer susceptibility to acute lymphoblastic leukemia and thrombocytopenia. PLOS Genetics, 11(6):e , Valentina Silvestri, Veronica Zelli, Virginia Valentini, Piera Rizzolo, Anna Sara Navazio, and Anna Coppa et al. Whole-exome sequencing and targeted gene sequencing provide insights into the role of PALB2 as a male breast cancer susceptibility gene. Cancer, 123(2): , Carla Daniela Robles-Espinoza, Mark Harland, Andrew J. Ramsay, Lauren G. Aoude, Victor Quesada, and Zhihao Ding et al. POT1 loss-of-function variants predispose to familial melanoma. Nature Genetics, 46(5): , Satoru Yokoyama, Susan L. Woods, Glen M. Boyle, Lauren G. Aoude, Stuart MacGregor, and Victoria Zismann et al. A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma. Nature, 480(7375):99 103, Paola Ghiorzo, Lorenza Pastorino, Paola Queirolo, William Bruno, Maria G. Tibiletti, and Sabina Nasti et al. Prevalence of the E318K MITF germline mutation in Italian melanoma patients: associations with histological subtypes and family cancer history. Pigment Cell & Melanoma Research, 26(2): , Marianne Berwick, Jamie MacArthur, Irene Orlow, Peter Kanetsky, Colin B. Begg, and Li Luo et al. MITF E318K s effect on melanoma risk independent 226

258 of, but modified by, other risk factors. Pigment Cell & Melanoma Research, 27(3): , J. William Harbour, Michael D. Onken, Elisha D.O. Roberson, Shenghui Duan, Li Cao, and Lori A. Worley et al. Frequent mutation of BAP1 in metastasizing uveal melanomas. Science, 330(6009): , Joseph R. Testa, Mitchell Cheung, Jianming Pei, Jennifer E. Below, Yinfei Tan, and Eleonora Sementino et al. Germline BAP1 mutations predispose to malignant mesothelioma. Nature Genetics, 43(10): , Thomas Wiesner, Anna C. Obenauf, Rajmohan Murali, Isabella Fried, Klaus G. Griewank, and Peter Ulz et al. Germline mutations in BAP1 predispose to melanocytic tumors. Nature Genetics, 43(10): , Mohamed H. Abdel-Rahman, Robert Pilarski, Colleen M. Cebulla, James B. Massengill, Benjamin N. Christopher, and Getachew Boru et al. Germline BAP1 mutation predisposes to uveal melanoma, lung adenocarcinoma, meningioma, and other cancers. Journal of Medical Genetics, 48(12): , Lauren G. Aoude, Karin Wadt, Anders Bojesen, Dorthe Cruger, Ake Borg, and Jeffrey M. Trent et al. A BAP1 mutation in a Danish family predisposes to uveal melanoma and other cancers. PLOS ONE, 8(8):e72144, Mitchell Cheung, Jacqueline Talarchek, Karen Schindeler, Eduardo Saraiva, Lynette S. Penney, Mark Ludman, and Joseph R. Testa. Further evidence for germline BAP1 mutations predisposing to melanoma and malignant mesothelioma. Cancer Genetics, 206(5): , Megan N. Farley, Laura S. Schmidt, Jessica L. Mester, Samuel Pena-Llopis, Andrea Pavia-Jimenez, and Alana Christie et al. A novel germline mutation in BAP1 predisposes to familial clear-cell renal cell carcinoma. Molecular Cancer Research, 11(9): , Tatiana Popova, Lucie Hebert, Virginie Jacquemin, Sophie Gad, Virginie Caux-Moncoutier, and Catherine Dubois-d Enghien et al. Germline BAP1 mutations predispose to renal cell carcinomas. The American Journal of Human Genetics, 92(6): ,

259 669 David A. Maerker, Michael Zeschnigk, Jasmin Nelles, Dietmar R. Lohmann, Karl Worm, and Anja K. Bosserhoff et al. BAP1 germline mutation in two first grade family members with uveal melanoma. British Journal of Ophthalmology, 98(2): , Robert Pilarski, Colleen M. Cebulla, James B. Massengill, Karan Rai, Thereasa Rich, and Louise Strong et al. Expanding the clinical phenotype of hereditary BAP1 cancer predisposition syndrome, reporting three new cases. Genes, Chromosomes and Cancer, 53(2): , Colleen M. Cebulla, Elaine M. Binkley, Robert Pilarski, James B. Massengill, Karan Rai, and David A. Liebner et al. Analysis of BAP1 germline gene mutation in young uveal melanoma patients. Ophthalmic Genetics, 36(2): , Arnaud de la Fouchardiere, Odile Cabaret, Liliana Savin, Patrick Combemale, Hubert Schvartz, and Clotilde Penet et al. Germline BAP1 mutations predispose also to multiple basal cell carcinomas. Clinical Genetics, 88(3): , Sonja Klebe, Jack Driml, Masaki Nasu, Sandra Pastorino, Amirmasoud Zangiabadi, Douglas Henderson, and Michele Carbone. BAP1 hereditary cancer predisposition syndrome: a case report and review of literature. Biomarker Research, 3(1):14, Pedram Gerami, Oriol Yelamos, Christina Y. Lee, Roxana Obregon, Pedram Yazdan, and Lauren M. Sholl et al. Multiple cutaneous melanomas and clinically atypical moles in a patient with a novel germline BAP1 mutation. JAMA Dermatology, 151(11): , Karan Rai, Robert Pilarski, Colleen M. Cebulla, and Mohamed H. Abdel-Rahman. Comprehensive review of BAP1 tumor predisposition syndrome with report of two new cases. Clinical Genetics, 89(3): , Karin A.W. Wadt, Lauren G. Aoude, Peter Johansson, Annalisa Solinas, Antonia L. Pritchard, and Oana Crainic et al. A recurrent germline BAP1 228

260 mutation and extension of the BAP1 tumor predisposition spectrum to include basal cell carcinoma. Clinical Genetics, 88(3): , David J. Barnes, Edward Hookway, Nick Athanasou, Takeshi Kashima, Udo Oppermann, and Simon Hughes et al. A germline mutation of CDKN2A and a novel RPLP1-C19MC fusion detected in a rare melanotic neuroectodermal tumor of infancy: a case report. BMC Cancer, 16(1):629, Miriam J. Smith, James O Sullivan, Sanjeev S. Bhaskar, Kristen D. Hadfield, Gemma Poke, and John Caird et al. Loss-of-function mutations in SMARCE1 cause an inherited disorder of multiple spinal meningiomas. Nature Genetics, 45(3): , Miriam J. Smith, Andrew J. Wallace, Chris Bennett, Martin Hasselblatt, Ewelina Elert-Dobkowska, and Linton T. Evans et al. Germline SMARCE1 mutations predispose to both spinal and cranial clear cell meningiomas. The Journal of Pathology, 234(4): , Helen Raffalli-Ebezant, Scott A. Rutherford, Stavros Stivaros, Anna Kelsey, Miriam Smith, D. Gareth Evans, and John-Paul Kilday. Pediatric intracranial clear cell meningioma associated with a germline mutation of SMARCE1: a novel case. Child s Nervous System, 31(3): , Linton T. Evans, Jack Van Hoff, William F. Hickey, Miriam J. Smith, D. Gareth Evans, William G. Newman, and David F. Bauer. SMARCE1 mutations in pediatric clear cell meningioma: case report. Journal of Neurosurgery: Pediatrics, 16(3): , Wei Dai, Hong Zheng, Arthur Kwok Leung Cheung, Clara Sze-man Tang, Josephine Mun Yee Ko, and Bonnie Wing Yan Wong et al. Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma. Proceedings of the National Academy of Sciences, 113(12): , Sudheer Kumar Gara, Li Jia, Maria J. Merino, Sunita K. Agarwal, Lisa Zhang, and Maggie Cam et al. Germline HABP2 mutation causing familial nonmedullary thyroid cancer. New England Journal of Medicine, 373(5): ,

261 684 Chang Liu, Yang Yu, Guangliang Yin, Junxia Zhang, Wei Wen, and Xianhui Ruan et al. C14orf93 (RTFC) is identified as a novel susceptibility gene for familial nonmedullary thyroid cancer. Biochemical and Biophysical Research Communications, pages 1 7, Ed Dicks, Honglin Song, Susan J. Ramus, Elke Van Oudenhove, Jonathan P. Tyrer, and Maria P. Intermaggio et al. Germline whole exome sequencing and large-scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene. Oncotarget, Silvia Vilarinho, E. Zeynep Erson-Omay, Akdes Serin Harmanci, Raffaella Morotti, Geneive Carrion-Grant, and Jacob Baranoski et al. Paediatric hepatocellular carcinoma due to somatic CTNNB1 and NFE2L2 mutations in the setting of inherited bi-allelic ABCB11 mutations. Journal of Hepatology, 61(5): , Filemon S. Dela Cruz, Daniel Diolaiti, Andrew T. Turk, Allison R. Rainey, Alberto Ambesi-Impiombato, and Stuart J. Andrews et al. A case study of an integrative genomic and experimental therapeutic approach for rare tumors: identification of vulnerabilities in a pediatric poorly differentiated carcinoma. Genome Medicine, 8(1):116, Yoko Shimada, Takashi Kohno, Hideki Ueno, Yoshinori Ino, Hideyuki Hayashi, and Takashi Nakaoku et al. An oncogenic ALK fusion and an RRAS mutation in KRAS mutation-negative pancreatic ductal adenocarcinoma. The Oncologist, 22(2): , Jerneja Tomsic, Huiling He, Keiko Akagi, Sandya Liyanarachchi, Qun Pan, and Blake Bertani et al. A germline mutation in SRRM2, a splicing factor gene, is implicated in papillary thyroid carcinoma predisposition. Scientific Reports, 5, Alberto Cascon, Inaki Comino-Mendez, Maria Curras-Freixes, Aguirre A. de Cubas, Laura Contreras, and Susan Richter et al. Whole-exome sequencing identifies MDH2 as a new familial paraganglioma gene. Journal of the National Cancer Institute, 107(5):djv053,

262 691 Andrew Feber, Daniel C. Worth, Ankur Chakravarthy, Patricia de Winter, Kunal Shah, and Manit Arya et al. Somatic mutations in penile squamous cell carcinoma. Cancer Research, 76(16): , Inaki Comino-Mendez, Francisco J. Gracia-Aznarez, Francesca Schiavi, Inigo Landa, Luis J. Leandro-Garcia, and Rocio Leton et al. Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma. Nature Genetics, 43(7): , Nelly Burnichon, Alberto Cascon, Francesca Schiavi, Nicole Paes Morales, Inaki Comino-Mendez, and Nassera Abermil et al. Mutations cause hereditary and sporadic pheochromocytoma and paraganglioma. Clinical Cancer Research, 18(10): , Mariola Peczkowska, Aldona Kowalska, Jacek Sygut, Dariusz Waligorski, Angelica Malinoc, and Hanna Janaszek-Sitkowska et al. Testing new susceptibility genes in the cohort of apparently sporadic phaeochromocytoma/paraganglioma patients with clinical characteristics of hereditary syndromes. Clinical Endocrinology, 79(6): , Sohela Shah, Kasmintan A. Schrader, Esme Waanders, Andrew E. Timms, Joseph Vijai, and Cornelius Miething et al. A recurrent germline PAX5 mutation confers susceptibility to pre-b cell acute lymphoblastic leukemia. Nature Genetics, 45(10): , Yuji Ikeda, Kazuma Kiyotani, Poh Yin Yew, Taigo Kato, Kenji Tamura, and Kai Lee Yap et al. Germline PARP4 mutations in patients with primary thyroid and breast cancers. Endocrine-Related Cancer, 23(3): , Pia Ostergaard, Michael A. Simpson, Fiona C. Connell, Colin G. Steward, Glen Brice, and Wesley J. Woollard et al. Mutations in GATA2 cause primary lymphedema associated with a predisposition to acute myeloid leukemia (Emberger syndrome). Nature Genetics, 43(10): , Christopher N. Hahn, Chan-Eng Chong, Catherine L. Carmichael, Ella J. Wilkins, Peter J. Brautigan, and Xiao-Chun Li et al. Heritable GATA2 mutations associated with familial myelodysplastic syndrome and acute myeloid leukemia. Nature Genetics, 43(10): ,

263 699 Harriet Holme, Upal Hossain, Michael Kirwan, Amanda Walne, Tom Vulliamy, and Inderjeet Dokal. Marked genetic heterogeneity in familial myelodysplasia/acute myeloid leukaemia. British Journal of Haematology, 158(2): , Jan Kazenwadel, Genevieve A. Secker, Yajuan J. Liu, Jill A. Rosenfeld, Robert S. Wildin, and Jennifer Cuellar-Rodriguez et al. Loss-of-function germline GATA2 mutations in patients with MDS/AML or MonoMAC syndrome and primary lymphedema reveal a key role for GATA2 in the lymphatic vasculature. Blood, 119(5): , Marlene Pasquet, Christine Bellanne-Chantelot, Suzanne Tavitian, Nais Prade, Blandine Beaupain, and Olivier LaRochelle et al. High frequency of GATA2 mutations in patients with mild chronic neutropenia evolving to MonoMac syndrome, myelodysplasia, and acute myeloid leukemia. Blood, 121(5):822, Juehua Gao, Ryan D. Gentzler, Andrew E. Timms, Marshall S. Horwitz, Olga Frankfurt, Jessica K. Altman, and LoAnn C. Peterson. Heritable GATA2 mutations associated with familial AML-MDS: a case report and review of literature. Journal of Hematology & Oncology, 7(1):36, Christopher N. Hahn, Peter J. Brautigan, Chan-Eng Chong, Alex Janssan, Parameswaran Venugopal, and Young Lee et al. Characterisation of a compound in-cis GATA2 germline mutation in a pedigree presenting with myelodysplastic syndrome/acute myeloid leukemia with concurrent thrombocytopenia. Leukemia, 29(8): , Xinan Wang, Hideki Muramatsu, Yusuke Okuno, Hirotoshi Sakaguchi, Kenichi Yoshida, and Nozomu Kawashima et al. GATA2 and secondary mutations in familial myelodysplastic syndromes and pediatric myeloid malignancies. Haematologica, 100(10):e398, Liesel M. FitzGerald, Akash Kumar, Evan A. Boyle, Yuzheng Zhang, Laura M. McIntosh, and Suzanne Kolb et al. Germline missense variants in the BTNL2 gene are associated with prostate cancer susceptibility. Cancer Epidemiology Biomarkers & Prevention, 22(9): ,

264 706 Daniel C. Koboldt, Krishna L. Kanchi, Bin Gui, David E. Larson, Robert S. Fulton, and William B. Isaacs et al. Rare variation in TET2 is associated with clinically relevant prostate carcinoma in African Americans. Cancer Epidemiology Biomarkers & Prevention, 25(11): , Takahide Hayano, Hiroshi Matsui, Hirofumi Nakaoka, Nobuaki Ohtake, Kazuyoshi Hosomichi, Kazuhiro Suzuki, and Ituro Inoue. Germline variants of prostate cancer in Japanese families. PLOS ONE, 11(10):e , Danielle M. Karyadi, Milan S. Geybels, Eric Karlins, Brennan Decker, Laura McIntosh, and Amy Hutchinson et al. Whole exome sequencing in 75 high-risk families with validation and replication in independent case-control studies identifies TANGO2, OR5H14, and CHAD as new prostate cancer susceptibility genes. Oncotarget, 8(1): , Sally M. Hunter, Simone M. Rowley, David Clouston, Jason Li, Richard Lupat, and Nishanth Krishnananthan et al. Searching for candidate genes in familial BRCAX mutation carriers with prostate cancer. Urologic Oncology: Seminars and Original Investigations, 34(3):120.e9 120.e16, Krinio Giannikou, Izabela A. Malinowska, Trevor J. Pugh, Rachel Yan, Yuen-Yi Tseng, and Coyin Oh et al. Whole exome sequencing identifies TSC1/TSC2 biallelic loss as the primary and sufficient driver event for renal angiomyolipoma development. PLOS Genetics, 12(8):e , Jinwoo Ahn, Kyung Seok Han, Jun Hyeok Heo, Duhee Bang, You Hyun Kang, and Hyun A. Jin et al. FOXC2 and CLIP4: a potential biomarker for synchronous metastasis of < 7-cm clear cell renal cell carcinomas. Oncotarget, 7(32): , Frank Y. Lin, Katie Bergstrom, Richard Person, Abhishek Bavle, Leomar Y. Ballester, and Sarah Scollon et al. Integrated tumor and germline whole-exome sequencing identifies mutations in MAPK and PI3K pathway genes in an adolescent with rosette-forming glioneuronal tumor of the fourth ventricle. Cold Spring Harbor Molecular Case Studies, 2(5):a001057, Leora Witkowski, Jian Carrot-Zhang, Steffen Albrecht, Somayyeh Fahiminiya, Nancy Hamel, and Eva Tomiak et al. Germline and somatic 233

265 SMARCA4 mutations characterize small cell carcinoma of the ovary, hypercalcemic type. Nature Genetics, 46(5): , Pilar Ramos, Anthony N. Karnezis, David W. Craig, Aleksandar Sekulic, Megan L. Russell, and William P. D. Hendricks et al. Small cell carcinoma of the ovary, hypercalcemic type, displays frequent inactivating germline and somatic mutations in SMARCA4. Nature Genetics, 46(5): , Pierre-Marie Lavrut, Francois Le Loarer, Charline Normand, Celine Grosos, Remi Dubois, and Anni Buenerd et al. Small cell carcinoma of the ovary, hypercalcemic type/ovarian malignant rhabdoid tumor: report of a bilateral case in a teenager associated with SMARCA4 germline mutation. Pediatric Developmental Pathology, 19:56 60, Joanna Moes-Sosnowska, Lukasz Szafron, Dorota Nowakowska, Agnieszka Dansonka-Mieszkowska, Agnieszka Budzilowska, and Bozena Konopka et al. Germline SMARCA4 mutations in patients with ovarian small cell carcinoma of hypercalcemic type. Orphanet Journal of Rare Diseases, 10(1):32, Yoshitatsu Sei, Xilin Zhao, Joanne Forbes, Silke Szymczak, Qing Li, and Apurva Trivedi et al. A hereditary form of small intestinal carcinoid associated with a germline mutation in inositol polyphosphate multikinase. Gastroenterology, 149(1):67 78, Kie Kyon Huang, Kang Won Jang, Sangwoo Kim, Han Sang Kim, Sung-Moo Kim, and Hyeong Ju Kwon et al. Exome sequencing reveals recurrent REV3L mutations in cisplatin-resistant squamous cell carcinoma of head and neck. Scientific Reports, 6:19552, Huixing Pan, Xiaojian Xu, Deyao Wu, Qiaocheng Qiu, Shoujun Zhou, and Xuefeng He et al. Novel somatic mutations identified by whole-exome sequencing in muscle-invasive transitional cell carcinoma of the bladder. Oncology Letters, 11(2): , Sandra Hanks, Elizabeth R. Perdeaux, Sheila Seal, Elise Ruark, Shazia S. Mahamdallie, and Anne Murray et al. Germline mutations in the PAF1 complex gene CTR9 predispose to Wilms tumour. Nature Communications, 5,

266 721 Cristina R. Antonescu and Paola Dal Cin. Promiscuous genes involved in recurrent chromosomal translocations in soft tissue tumours. 46(2): , Pathology, 722 Jungho Kim and Jerry Pelletier. Molecular genetics of chromosome translocations involving EWS and related family members. Physiological Genomics, 1(3): , Fredrik Mertens, Cristina R. Antonescu, and Felix Mitelman. Gene fusions in soft tissue tumors: recurrent and overlapping pathogenetic themes. Genes, Chromosomes and Cancer, 55(4): , Felix Mitelman, Bertil Johansson, and Fredrik Mertens. The impact of translocations and gene fusions on cancer causation. 7(4): , Nat Rev Cancer, 725 Johanna Manner, Bernhard Radlwimmer, Peter Hohenberger, Katharina Mossinger, Stefan Kuffer, and Christian Sauer et al. MYC high level gene amplification is a distinctive feature of angiosarcomas after irradiation or chronic lymphedema. The American Journal of Pathology, 176(1):34 39, Patrick S. Tarpey, Sam Behjati, Susanna L. Cooke, Peter Van Loo, David C. Wedge, and Nischalan Pillay et al. Frequent mutation of the major cartilage collagen gene COL2A1 in chondrosarcoma. Nature Genetics, 45(8): , Janusz Limon, Anna Szadowska, Mariola Iliszko, Malgorzata Babinska, Krzysztof Mrozek, and Janusz Jaskiewicz et al. Recurrent chromosome changes in two adult fibrosarcomas. Genes, Chromosomes and Cancer, 21(2): , Eva Van den Berg, Willemina M. Molenaar, Harald J. Hoekstra, Willem A. Kamps, and Bauke De Jong. DNA ploidy and karyotype in recurrent and metastatic soft tissue sarcomas. Modern Pathology, 5(5): , Paola Dal Cin, Patrick Pauwels, Raf Sciot, and Herman Van Den Berghe. Multiple chromosome rearrangements in a fibrosarcoma. Cancer Genetics and Cytogenetics, 87(2): ,

267 730 Jilong Yang, Xiaoling Du, Kexin Chen, Antti Ylipaa, Alexander J.F. Lazar, and Jonathan Trent et al. Genetic aberrations in soft tissue leiomyosarcoma. Cancer Letters, 275(1):1 8, Avery A. Sandberg. Updates on the cytogenetics and molecular genetics of bone and soft tissue tumors: leiomyosarcoma. Cytogenetics, 161(1):1 19, Cancer Genetics and 732 Ahmed Idbaih, Jean-Michel Coindre, Josette Derre, Odette Mariani, Philippe Terrier, and Dominique Ranchere et al. Myxoid malignant fibrous histiocytoma and pleomorphic liposarcoma share very similar genomic imbalances. Laboratory Investigation, 85(2): , Hannelore Schmidt, Frank Bartel, Matthias Kappler, Peter Wurl, Heidemarie Lange, and Matthias Bache et al. Gains of 13q are correlated with a poor prognosis in liposarcoma. Modern Pathology, 18(5): , Barry S. Taylor, Jordi Barretina, Nicholas D. Socci, Penelope DeCarolis, Marc Ladanyi, and Matthew Meyerson et al. Functional copy-number alterations in cancer. PLOS ONE, 3(9):e3179, Christopher D.M. Fletcher, Paola Dal Cin, Ivo De Wever, Nils Mandahl, Fredrik Mertens, and Felix Mitelman et al. Correlation between clinicopathological features and karyotype in spindle cell sarcomas: a report of 130 cases from the CHAMP study group. The American Journal of Pathology, 154(6): , Fredrik Mertens, Paola Dal Cin, Ivo De Wever, Christopher D.M. Fletcher, Nils Mandahl, and Felix Mitelman et al. Cytogenetic characterization of peripheral nerve sheath tumours: a report of the CHAMP study group. The Journal of Pathology, 190(1):31 38, R. Stuart Bridge, Julia Ann Bridge, James R. Neff, Sabine Naumann, Pamela A. Althof, and Leslie A. Bruch. Recurrent chromosomal imbalances and structurally abnormal breakpoints within complex karyotypes of malignant peripheral nerve sheath tumour and malignant triton tumour: a cytogenetic and molecular cytogenetic study. Journal of Clinical Pathology, 57(11): ,

268 738 Fredrik Mertens, Christopher D.M. Fletcher, Paola Dal Cin, Ivo De Wever, Nils Mandahl, and Felix Mitelman et al. Cytogenetic analysis of 46 pleomorphic soft tissue sarcomas and correlation with morphologic and clinical features: a report of the CHAMP study group. Genes, Chromosomes and Cancer, 22(1):16 25, Anwar N. Mohamed, Mark M. Zalupski, James R. Ryan, Fred Koppitch, Stanley Balcerzak, Raymond Kempf, and Sandra R. Wolman. Cytogenetic aberrations and DNA ploidy in soft tissue sarcoma: a Southwest Oncology Group Study. Cancer Genetics and Cytogenetics, 99(1):45 53, Guidong Li, Akira Ogose, Hiroyuki Kawashima, Hajime Umezu, Tetsuo Hotta, and Tsuyoshi Tohyama et al. Cytogenetic and real-time quantitative reverse-transcriptase polymerase chain reaction analyses in pleomorphic rhabdomyosarcoma. Cancer Genetics and Cytogenetics, 192(1):1 9, Anthony Gordon, Aidan McManus, John Anderson, Cyril Fisher, Syuiti Abe, and Takayuki Nojima et al. Chromosomal imbalances in pleomorphic rhabdomyosarcomas and identification of the alveolar rhabdomyosarcoma-associated PAX3-FOXO1A fusion gene in one case. Cancer Genetics and Cytogenetics, 140(1):73 77, Josette Derre, Real Lagace, Andre Nicolas, Aline Mairal, Frederic Chibon, and Jean-Michel Coindre et al. Leiomyosarcomas and most malignant fibrous histiocytomas share very similar comparative genomic hybridization imbalances: an analysis of a series of 27 leiomyosarcomas. Laboratory Investigation, 81(2): , Marcelo L. Larramendy, Massimiliano Gentile, Sonia Soloneski, Sakari Knuutila, and Tom Bohling. Does comparative genomic hybridization reveal distinct differences in DNA copy number sequence patterns between leiomyosarcoma and malignant fibrous histiocytoma? Cancer Genetics and Cytogenetics, 187(1):1 11, Ana Carneiro, Princy Francis, Par-Ola Bendahl, Josefin Fernebro, Mans Akerman, and Christopher Fletcher et al. Indistinguishable genomic profiles and shared prognostic markers in undifferentiated pleomorphic sarcoma and 237

269 leiomyosarcoma: different sides of a single coin? 89(6): , Laboratory Investigation, 745 Ching C. Lau, Charles P. Harris, Xin-Yan Lu, Laszlo Perlaky, Sheila Gogineni, and Murali Chintagumpala et al. Frequent amplification and rearrangement of chromosomal bands 6p12-p21 and 17p11.2 in osteosarcoma. Genes, Chromosomes and Cancer, 39(1):11 21, Shamini Selvarajah, Maisa Yoshimoto, Olga Ludkovski, Paul C. Park, Jane Bayani, and Paul Thorner et al. Genomic signatures of chromosomal instability and osteosarcoma progression detected by high resolution array CGH and interphase FISH. Cytogenetic and Genome Research, 122(1):5 15, Jane Bayani, Maria Zielenska, Ajay Pandita, Khaldoun Al-Romaih, Jana Karaskova, and Karen Harrison et al. Spectral karyotyping identifies recurrent complex rearrangements of chromosomes 8, 17, and 20 in osteosarcomas. Genes, Chromosomes and Cancer, 36(1):7 16, Julia A. Bridge, Marilu Nelson, Erin McComb, Michael H. McGuire, Howard Rosenthal, and Gerardo Vergara et al. Cytogenetic findings in 73 osteosarcoma specimens and a review of the literature. Cancer Genetics and Cytogenetics, 95(1):74 87,

270 Appendices 239

271

272 Appendix A World Health Organisation classification of soft tissue tumours and bone tumours SOFT TISSUE TUMOURS Adipocytic tumours Benign Lipoma Lipomatosis Lipomatosis of nerve Lipoblastoma / lipoblastomatosis Angiolipoma Myolipoma of soft tissue Chondroid lipoma Extra-renal angiomyolipoma Extra-adrenal myelolipoma Spindle cell / pleomorphic lipoma Hibernoma 241

273 Intermediate (locally aggressive) Atypical lipomatous tumour / well differentiated liposarcoma Malignant Dedifferentiated liposarcoma Myxoid liposarcoma Pleomorphic liposarcoma Liposarcoma, not otherwise specified Atypical lipomatous tumour (ALT) Adipocytic (lipoma-like) Sclerosing Inflammatory types Dedifferentiated liposarcoma Fibroblastic / myofibroblastic tumours Benign Nodular fasciitis Proliferative fasciitis Proliferative myositis Myositis ossifficans Fibro-osseous pseudotumour of digits Ischemic fasciitis Elastofibroma Fibrous hamartoma of infancy Fibromatosis colli Juvenile hyaline fibromatosis Inclusion body fibromatosis Fibroma of tendon sheath Desmoplastic fibroblastoma Mammary-type myofibroblastoma Calcifying aponeurotic fibroma 242

274 Angiomyofibroblastoma Cellular angiofibroma Nuchal-type fibroma Gardner fibroma Calcifying fibrous tumour Intermediate (locally aggressive) Palmar / plantar fibromatosis Desmoids-type fibromatosis Lipofibromatosis Giant cell fibroblastoma Intermediate (rarely metastasizing) Dermatofibrosarcoma protuberans Fibrosarcomatous dermatofibrosarcoma protuberans Pigmented dermatofibrosarcoma protuberans Solitary fibrous tumour Solitary fibrous tumour, malignant Inflammatory myofibroblastic tumour Low grade myofibroblastic sarcoma Myxoinflammatory fibroblastic sarcoma Atypical myxoinflammatory fibroblastic tumour Infantile fibrosarcoma Malignant Adult fibrosarcoma Myxofibrosarcoma Low-grade fibromyxoid sarcoma Sclerosing epithelioid fibrosarcoma Nodular fasciitis Extrapleural solitary fibrous tumour Low grade fibromyxoid sarcoma (LGFMS) 243

275 Sclerosing epithelioid fibrosarcoma (SEF) So-called fibrohistiocytic tumours Benign Tenosynovial giant cell tumour Localized type Diffuse type Malignant Deep benign fibrous histiocytoma Intermediate (rarely metastasizing) Plexiform fibrohistiocytic tumour Giant cell tumour of soft tissue Tenosynovial giant cell tumour Smooth-muscle tumours Benign Leiomyoma of deep soft tissue Malignant Leiomyosarcoma (excluding skin) Leiomyosarcoma Pericytic (perivascular) tumours Glomus tumour (and variants) Glomangiomatosis Malignant glomus tumour Myopericytoma Myofibroma Myofibromatosis Angioleiomyoma Skeletal-muscle tumours Rhabdomyoma Embryonal rhabdomyosarcoma 244

276 Alveolar rhabdomyosarcoma Pleomorphic rhabdomyosarcoma Spindle cell / Sclerosing rhabdomyosarcoma Alveolar rhabdomyosarcoma (ARMS) Vascular tumours Benign Haemangioma Synovial Venous Arteriovenous haemangioma / malformation Epithelioid haemangioma Angiomatosis Lymphangioma Intermediate (locally aggressive) Kaposiform haemangioendothelioma Intermediate (rarely metastasizing) Retiform haemangioendothelioma Papillary intralymphatic angioendothelioma Composite haemangioendothelioma Pseudomyogenic (epithelioid sarcoma-like) haemangioendothelioma Kapsoi sarcoma Malignant Epithelioid haemangioendothelioma Angiosarcoma of soft tissue Gastrointestinal stromal tumours Benign gastrointestinal stromal tumour Gastrointestinal stromal tumour Gastrointestinal stromal tumour 245

277 Nerve sheath tumours Benign Schwannoma (including variants) Melanotic schwannoma Neurofibroma (including variants) Plexiform neurofibroma Perineurioma Malignant perineurioma Granular cell tumour Dermal nerve sheath myxoma Solitary circumscribed neuroma Ectopic meningioma Nasal glial heterotopia Benign Triton tumour Hybrid nerve sheath tumours Malignant Malignant peripheral nerve sheath tumour Epithelioid malignant nerve sheath tumour Malignant Triton tumour Malignant granular cell tumour Ectomesenchymoma Tumours of uncertain differentiation Benign Acral fibromyxoma Intramuscular myxoma (including cellular variant) Juxta-articular myxoma Deep ( aggressive ) angiomyxoma Pleomorphic hyalinizing angiectatic tumour Ectopic hamartomatous thymoma 246

278 Intermediate (locally aggressive) Haemosiderotic fibrolipomatous tumour Intermediate (rarely metastasizing) Atypical fibroxanthoma Angiomatoid fibrous histiocytoma Ossifying fibromyxoid tumour Ossifying fibromyxoid tumour, malignant Mixed tumour NOS Mixed tumour NOS, malignant Myoepithelioma Myoepithelial carcinoma Phosphaturic mesenchymal tumour Phosphaturic mesenchymal tumour Malignant Synovial sarcoma NOS Synovial sarcoma, spindle cell Synovial sarcoma, biphasic Epithelioid sarcoma Alveolar soft-part sarcoma Clear cell sarcoma of soft tissue Extraskeletal myxoid chondrosarcoma Extraskeletal Ewing sarcoma Desmoplastic small round cell tumour Extra-renal rhabdoid tumour Neoplasms with perivascular epithelioid cell differentiation (PEComa) PEComa NOS, benign PEComa NOS, malignant Intimal sarcoma 247

279 Undifferentiated / unclassified sarcomas Undifferentiated spindle cell sarcoma Undifferentiated pleomorphic sarcoma Undifferentiated round cell sarcoma Undifferentiated epithelioid sarcoma Undifferentiated sarcoma NOS Undifferentiated round cell and spindle cell sarcoma Undifferentiated pleomorphic sarcoma (UPS) TUMOURS OF BONE Chondrogenic tumours Benign Osteochondroma Chondroma Enchondroma Periosteal chondroma Osteochondromyxoma Subungual exostosis Bizarre parosteal osteochondromatous proliferation Synovial chondromatosis Intermediate (locally aggressive) Chondromyxoid fibroma Atypical cartilaginous tumour / chondrosarcoma grade I Intermediate (rarely metastasizing) Chondroblastoma Malignant Chondrosarcoma Grade II, Grade III Dedifferentiated chondrosarcoma Mesenchymal chondrosarcoma 248

280 Clear cell chondrosarcoma Osteochondromyxoma Bizarre parosteal osteochondromatous proliferation Chondrosarcoma (grades I-III) Osteogenic tumours Benign Osteoma Osteoid osteoma Intermediate (locally aggressive) Osteoblastoma Malignant Low-grade central osteosarcoma Conventional osteosarcoma Chondroblastic osteosarcoma Fibroblastic osteosarcoma Osteoblastic osteosarcoma Telangiectatic osteosarcoma Small cell osteosarcoma Secondary osteosarcoma Parosteal osteosarcoma Periosteal osteosarcoma High-grade surface osteosarcoma Osteoclastic giant cell rich tumours Benign Giant cell lesion of the small bones Intermediate locally aggressive Giant cell tumour of bone Malignant Malignancy in giant cell tumour of bone 249

281 Fibrohistiocytic tumours Benign Benign fibrous histiocytoma / non-ossifying fibroma Notochordal tumours Benign Benign notochordal tumour Malignant Chordoma Vascular tumours Benign Haemangioma Intermediate locally aggressive rarely metastasizing Epithelioid hemangioma Malignant Epithelioid hemangioendothelioma Angiosarcoma Reference: Bridge, J. A., et al. WHO classification of tumours of soft tissue and bone. International Agency for Research on Cancer,

282 Appendix B Novel tumour-predisposing genes identified by whole exome sequencing 251

283 252 Cancer Population Patients Genes Citation Additional studies Abestos exposed lung adenocarcinoma Finland 26 cases MRPL1, SDK1, SEMA5B, INPP4A 594 Adenomatous Netherlands, 51 patient from 48 families, NTHL1 595 polyposis and USA negative for APC and MUTYH colorectal mutations carcinomas Atypical gastric Spain Large family, with ATP4A 596 neuroendocrine consanguineous parents and tumour, type 1 5/10 affected children Brain Germany 1 family CASP9 597 Breast Poland, Canada 144 Polish and 51 French-Canadian patients with RECQL family history and/or early onset, negative for founder mutations in BRCA1, BRCA2, CHEK2, NBN and PALB2 China 9 early-onset patients with family history, negative for RECQL BRCA1 and BRCA2 mutations Finland 24 breast cancer patients from 11 families, negative for BRCA1 FANCM , 602 and BRCA2 mutations

284 Cancer Population Patients Genes Citation Additional studies Multiple 89 early-onset breast cancer patients from 47 families RINT1 603 Australia 33 breast cancer patients from FANCC, families, negative for BRCA1 BLM and BRCA2 mutations Multiple 13 families XRCC2 608 Finland 129 female hereditary breast ATM, MYC, 609 and/or ovarian cancer patients, PLAU, up to 989 female controls RAD1, and RRM2B Chondrosarcoma France 2 third-degree affected relatives in a single family EXT2 610 Chronic European 59 chronic lymphocytic ITGB2 611 lymphocytic leukaemia-prone families leukaemia and 173 unrelated chronic lymphocytic leukaemia patients UK 66 chronic lymphocytic POT1 612 leukaemia families Colorectal China 23 early onset colorectal cancer patients from 21 families EIF2AK4 613 Spain 3 patients from a large family FAN

285 254 Cancer Population Patients Genes Citation Additional studies Spain Patients from 29 families, CDKN1B, 615 negative for mutations in known XRCC4, colorectal cancer genes EPHX1, NFKBIZ, SMARCA4, BARD1 Finland 4 patients from a large family, negative for mutations in known colorectal cancer genes RPS Finland 96 patients with family history UACA, 617 of colorectal cancer, negative for SFXN4, mutations in known colorectal TWSG1, cancer genes PSPH, NUDT7, ZNF490, PRSS37, CCDC18, PRADC1, MRPL3, AKR1C4 Taiwan 50 colorectal cancer cases NRAS 618 UK 1,006 early-onset familial CRC MRE11, 619 cases and 1,609 healthy controls POLE2 and POT1

286 Cancer Population Patients Genes Citation Additional studies Netherlands 55 colorectal cancer cases with PTPN a disease onset before 45 years and LRP6 of age Colorectal UK Probands from 15 colorectal POLE, adenomas and adenoma families, negative POLD1 carcinomas for mutations in APC and MUTYH Ashkenazi 2 sisters MCM9 630 Colorectal adenomatous polyposis Germany 102 unrelated individuals MSH2 631 Germany 12 colorectal adenomas from DSC2, 629 seven unrelated patients PIEZO1, ZSWIM7 Colorectal Germany 12 colorectal adenomas DSC2, 632 adenomatous from 7 unrelated patients PIEZO1, polyposis with unexplained sporadic ZSWIM7 adenomatous polyposis Esophageal adenocarcinoma and Barrett esophagus USA Large family VSIG10L

287 256 Cancer Population Patients Genes Citation Additional studies Esophageal China 51 stage I and 53 stage III FAM84B 634 squamous cell esophageal squamous cell carcinoma carcinomas Familial China Large family, negative COQ6 635 schwannomatosis for mutations in known disease-causing genes Gastric Finland Large family with the diffuse type of gastric cancer, negative for mutations in CDH1 INSR, FBXO24, DOT1L 636 Netherlands Large family with the diffuse type of gastric cancer, negative for mutations in CDH1 CTNNA Glioma Multiple 90 patients from 55 families POT1 639 Hodgkin Middle East Family with 3 out of 5 affected ACAN 640 lymphoma children and healthy parents Finland Large family with nodular NPAT 641 lymphocyte predominant Hodgkin lymphoma USA 17 Hodgkin lymphoma prone families with three or more KDR 642 affected cases or obligate carriers (69 individuals) Infantile Brazil 2 affected brothers and their NDRG4 643 myofibromatosis healthy consanguineous parents

288 Cancer Population Patients Genes Citation Additional studies Multiple 11 patients from 4 families, and 5 simplex cases PDGFRB USA 11 patients from 9 families PDGFRB 645 USA Large family, negative for PDGFRB mutations NOTCH Invasive pituitary China 6 invasive pituitary adenomas DPCR1, 646 adenomas and 6 non-invasive pituitary EGFL7, adenomas the PRDM family and LRRC50 Juvenile USA Single patient, with extensive SMAD9 647 hamartomatous family history, negative polyposis for known disease-causing syndrome mutations Kaposi sarcoma Finland Large family STAT4 648 Kaposiform Japan Matched tumour and normal ITGB2, 649 hemangioendothelioma sample from an individual IL-32 and DIDO1 Liver France 2 individuals from a family with recurrent well-differentiated hepatocellular tumours DICER1 650 Lung USA Large family PARK Taiwan Large family YAP1 652

289 258 Cancer Population Patients Genes Citation Additional studies Arab An individual with lung cancer from an extended family segregating different types of hereditary cancer NBN 653 Lymphoblastic leukaemia Multiple Large family ETV Male breast Italy 1 male and 2 female BRCA1/2 mutation-negative breast cancer cases from a family Melanoma Multiple 101 patient from 56 melanoma families, negative for CDKN2A and CDK4 mutations Multiple 184 patients from 105 melanoma families, negative for CDKN2A and CDK4 mutations PALB2 656 POT POT USA, Australia, UK Patient from large melanoma family MITF , 660 USA Uveal melanoma patients BAP Finland 21 cases BAP1 594 Melanotic neuroectodermal tumour of infancy UK Single patient CDKN2A 677

290 Cancer Population Patients Genes Citation Additional studies Multiple spinal meningiomas UK 3 unrelated individuals with familial multiple spinal SMARCE meningiomas, negative for mutations in NF2 and SMARCB1 Nasopharyngeal China 161 NPC cases and 895 controls MST1R 682 carcinoma of Southern Chinese descent Nonmedullary thyroid cancer USA Large family HABP2 683 China 5 subjects from a large family RTFC 684 Ovarian UK, USA, 412 high grade serous ovarian FANCM 685 Australia, cancer Germany Paediatric hepatocellular carcinoma USA Single patient ABCB Paediatric poorly differentiated USA Patient with pediatric poorly differentiated carcinoma APC 687 carcinoma Pancreatic ductal Japan 4 cases of KRAS DCTN1-ALK 688 adenocarcinoma mutation-negative pancreatic fusion ductal adenocarcinoma 259 Papillary thyroid carcinoma USA, Canada Large family SRRM2 689

291 260 Cancer Population Patients Genes Citation Additional studies Paraganglioma Spain Patient with multiple paragangliomas and family history of the disease MDH2 690 Penile squamous cell carcinoma UK 27 patients CSN1 691 Pheochromocytoma Spain 3 patients with familial pheochromocytoma, negative for mutations in known disease causing genes MAX , 694 Pre-B cell acute lymphoblastic leukemia Puerto Rican African American ancestry 2 families PAX2 695 Primary thyroid USA 14 female research participants PARP4 696 and breast with primary thyroid and breast cancers without mutations in PTEN

292 Cancer Population Patients Genes Citation Additional studies Primary lymphedema European and Chinese 2 unrelated patients with family history and 1 sporadic case GATA associated with descent a predisposition to acute myeloid leukemia (Emberger syndrome) Prostate USA 91 patient from 19 families BTNL2 705 African American 652 aggressive prostate cancer patients and 752 disease-free controls TET2 706 Japan 140 patients with PC from 66 families TRRAP 707 USA 75 high risk families TANGO2, OR5H14, and CHAD

293 262 Cancer Population Patients Genes Citation Additional studies Australia 5 prostate cancer-affected men PCTP, 709 from 3 families MCRS1, ATRIP, PARP2, CYP3A43, DOK3, PLEKHH3, HEATR5B, GPR124, and HKR1 Renal angiomyolipoma Renal cell carcinoma USA 15 patients TSC1 and TSC2 Korea 10 patients FOXC2 and CLIP Rosette-forming African A patient with rosette-forming FGFR1, 712 glioneuronal American glioneuronal tumour of the PIK3CA, tumour fourth ventricle PTPN11 Small cell carcinoma of the ovary, hypercalcemic type USA, Canada, UK 6 patients from 3 families SMARCA Multiple 7 patients SMARCA , 715, 716 Small intestinal carcinoids USA Large family IPMK 717

294 Cancer Population Patients Genes Citation Additional studies Squamous cell Korea 18 cisplatin-resistant metastatic REV3L 718 carcinoma of head tumours and matched germline and neck Transitional cell carcinoma China 2 patients HECW1 719 Wilms tumour UK 35 families CTR9 720 PubMed search was performed using a string (exome OR exom* OR NGS OR whole genome OR next-generation OR next generation OR WES) AND (familial OR hereditary OR susceptib* OR risk OR germline OR germline ) AND (sequencing OR analysis) AND (cancer OR malignancy OR tumor* OR tumour*) AND English [lang]. Only studies which reported the identification of novel genes by exome sequencing were included. Search results included up to March

295 264

296 Appendix C Familial cancer syndromes associated with sarcomas 265

297 266 Syndrome Sarcoma Inheritance Gene (location) Features Beckwith-Wiedemann syndrome RMS AD NSD1 (5q35.3), KIP2 (11p15.4),CDKN1C (11p15.4), H19 (11p15.5), KCNQ1OT1 (11p15.5), ICR1 (11p15.5) Overgrowth syndrome: macroglossia, omphalocele, hemihypertrophy, gigantism, and associated tumour predisposition Bloom syndrome RMS AR BLM (15q26.1) Progerioid syndrome: growth retardation, sun sensitivity, telangiectasias and other skin changes, and associated tumour predisposition Costello syndrome RMS AD HRAS (11p15.5) Rasopathy: coarse facies, short stature, distinctive hand posture and appearance, cardiac anomalies, developmental delay, congenital myopathy Familial adenomatous polyposis Gardner fibroma, desmoid, RMS AD APC (5q21-q22) Individuals develop hundreds to thousands of polyps of the colon and rectum that can progress to colorectal carcinoma if not treated Familial Gastrointesinal AD KIT (4q12), Multiple gastrointestinal stromal tumours gastrointestinal stromal tumour PDGFRA (4q12) stromal tumour Glomus tumours Glomus tumour AD GLMN (1p22.1) Glomuvenous malformations, glomangioma

298 Syndrome Sarcoma Inheritance Gene (location) Features Gorlin-Goltz nevoid basal cell carcinoma Hereditary leiomyomatosis and renal cell carcinoma syndrome Hereditary Retinoblastoma RMS, fetal rhabdomyoma Leiomyosarcoma (uterus) Sarcomas as second malignant neoplasm, lipoma AD PTCH (Xp11.23) Multiple basal cell carcinomas, odontogenic keratocysts, palmar/plantar pits, calcification of the falx cerebri, rib abnormalities AD FH (1q43) Tumour predisposition syndrome: cutaneous piloleiomyomas, uterine leiomyomas, type 2 papillary renal cell carcinomas AD RB1 (13q14.2) Retinoblastoma, often bilateral and typically in very early childhood Leiomyomatosis-Alport syndrome Leiomyoma XLD COL4A6 (Xq22.3) Alport syndrome plus multiple, diffuse leiomyomas Li-Fraumeni syndrome RMS, undifferentiated pleomorphic sarcoma, pleomorphic liposarcoma AD TP53 (17p13.1) Inherited cancer syndrome: early onset of tumours, multiple tumours within individual; most commonly sarcomas, others include breast cancer, central nervous system tumours, leukaemia and adrenocortical carcinoma Maffucci syndrome Spindle cell IDH1 (2q34), Multiple enchondromas (increased risk of hemangiomas IDH2 (15q26.1) chondrosarcoma) and hemangiomas Mazabraud syndrome Myxomas GNAS1 (20q13.32) Myxomas and fibrous dysplasia 267

299 268 Syndrome Sarcoma Inheritance Gene (location) Features Mosaic variegated aneuploidy RMS AR BUB1B (15q15) Intrauterine growth restriction, microcephaly, spectrum of other anomalies, and a high risk of malignancy including RMS, Wilms, and hematologic malignancy Neurofibromatosis type 1 Neurofibromatosis type 2 Malignant peripheral nerve sheath tumour, RMS, neurofibroma, gastrointestinal stromal tumour Schwannoma, RMS, malignant rhabdoid tumour AD NF1 ( 17q11.2) Cafe-au-lait spots, Lisch nodules in the eye, increased susceptibility to benign and malignant tumours AD NF2 (22q12.2) Tumours of the eighth cranial nerve (usually bilateral) and other schwannomas, meningiomas of the brain, and schwannomas of the dorsal roots of the spinal cord Nijmegen breakage syndrome RMS AR NBS1 (8q21.3) Chromosomal instability syndrome - microcephaly, growth retardation, immunodeficiency, and tumour predisposition Noonan syndrome RMS, lymphangioma AD PTPN11 (12q24) Rasopathy - Dysmorphic facies, short stature, neck webbing, cardiac anomalies, deafness, bleeding diathesis Roberts syndrome RMS AR ESC02 (8p21.1) Range of mild to severe malformation of bones, arms, legs, skull, and face - features similar to those seen in thalidomide exposure

300 Syndrome Sarcoma Inheritance Gene (location) Features Rothmund-Thomson syndrome Osteosarcoma AR RTS (18q24.3) Skin atrophy, telangiectasia, hyper- and hypopigmentation, congenital skeletal abnormalities, short stature, premature ageing, and increased risk of malignant disease Rubinstein-Taybi syndrome RMS AD CREBBP (16p13.1) Multiple congenital anomalies, developmental delay,microcephaly, dysmorphic features, and tumour predisposition Simpson-Golabi-Behmel syndrome Embryonal tumours XLR GPC3 Overgrowth syndrome - coarse facies, congenital heart defects, overgrowth, and other anomalies Tuberous sclerosis RMS, cardiac AD TSC1 (9q34), Hamartomas of multiple organs, rhabdomyoma,chordoma, TSC2 (16p13.3), angiomyolipomas, other renal tumours renal TSC3 (12q22- (cysts and renal cell carcinomas), angiomyolipoma, 24.1) lymphangioleiomyomatosis, angiofibromas perivascular and other skin lesions epithelioid cell tumours Werner syndrome RMS AR WRN (8p12-p11.2) Progerioid syndrome - Scleroderma-like skin changes, early onset atherosclerosis and diabetes AD: autosomal dominant. AR: autosomal recessive. RMS: rhabdomyosarcoma. XLD: X linked dominant. XLR: X linked recessive. 269

301 270

302 Appendix D Translocations associated with sarcomas Translocation Genes Alveolar rhabdomyosarcoma t(2;13)(q36;q14) PAX3 FOXO1 t(1;13)(p36;q14) PAX7 FOXO1 t(8;13;9)(p11;q14;q32) FOXO1-FGFR1 t(x;2)(q13;q36) PAX3-FOXO4 t(2;2)(p23;q36) PAX3-NCOA1 t(2;8)(q36;q13) PAX3-NCOA2 Alveolar soft-part sarcoma t(x;17)(p11.2;q25) ASPL TFE3 Angiomatoid fibrous histiocytoma t(2;22)(q33;q12) EWSR1-CREB1 t(12;16)(q13;p11) FUS-ATF1 t(12;22)(q13;q12) EWSR1-ATF1 Chondroid lipoma t(11;16)(q13.p13) C11orf95-MKL2 271

303 Translocation Genes Clear-cell sarcoma t(2;22)(q33;q12) EWSR1-CREB1 t(12;22)(q13;q12) EWSR1 ATF1 Congenital fibrosarcoma t(12;15)(p13;q25) ETV6 NTRK3 Dedifferentiated liposarcoma t(5;5)(p15;p15) TRIO-TERT t(9;12)(q33;q15) CNOT2-ASTN2?t(12)(q14q14) CTDSP2-FAM19A2 t(9;12)(q33;q21) NR6A1-TRHDE?t(12)(q15q21) NUP107-LGR5 t(9;12)(q33;q15) NUP107-PAPPA t(5;14)(p13;q32) RCOR1-WDR70 Dermatofibrosarcoma protuberans t(17;22)(q22;q13) COL1A1 PDGFB Desmoplastic small round-cell tumour t(11;22)(p13;q12) EWSR1 WT1 t(21;22)(q22;q12) EWSR1-ERG Endometrial stromal sarcoma t(6;10)(p21;p11) EPC1-PHF1 t(6;7)(p21;p15) JAZF1-PHF1 t(7;17)(p15;q11) JAZF1-SUZ12 t(1;6)(p34;p21) MEAF6-PHF1 t(10;17)(q23;p13) YWHAE-FAM22A t(10;17)(q22;p13) YWHAE-FAM22B t(x;22)(p11;q13) ZC3H7B-BCOR Epithelioid hemangioendothelioma t(1;3)(p36;q25) WWTR1-CAMTA1 t(x;11)(p11;q22) YAP1-TFE3 272

304 Translocation Genes Epithelioid sarcoma of the ovary t(12;12)(q23;q24) CMKLR1-HNF1A t(12;12)(q13;q22) ERBB3-CRADD t(1;22)(p36;q11) SMARCB1-WASF2 Ewing s sarcoma t(11;22)(q24;q12) EWSR1 FLI1 t(21;22)(q22;q12) EWSR1 ERG t(7;22)(p22;q12) EWSR1-ER81 t(17;22)(q21;q12) EWSR1-ETV4 t(2;22)(q33;q12) EWSR1 FEV t(21,22)(q22;q12) EWSR1-ERG t(16,21)(p11;q24) FUS-ERG t(2,16)(q35;p11) FUS-FEV t(20,22)(q13;q12) EWSR1-NFATC1 t(6,22)(p21;q12) EWSR1-POU5F1 t(4,22)(q31;q12) EWSR1-SMARCA5 t(7;22)(p21;q12) EWSR1-ETV1 Fibromyxoid sarcoma t(7;16)(q34;p11) FUS-CREB3L2 t(11;16)(p11;p11) FUS-CREB3L1 t(11:22)(p11;q12) EWSR1-CREB3L1 Inflammatory myofibroblastic tumour 2p23 rearrangements TMP3 ALK; TMP4 ALK inv(2)(p23q35) ATIC-ALK t(2;11)(p23;p15) CARS-ALK t(2;17)(p23;q23) CLTC-ALK t(2;12)(p23;p11) PPFIBP1-ALK t(2;2)(p23;q13) RANBP2-ALK t(x;6)(p11;p24) RREB1-TFE3 t(2;4)(p23;q21) SEC31A-ALK t(1;2)(q21;o23) TPM3-ALK 273

305 Translocation Genes t(2;19)(p23;p13) TPM4-ALK t(2;2)(p21;p23) EML4-ALK Kaposi s sarcoma EZH2, SIRT1 Leiomyoma of the uterus inv(7)(p21q22) CUX1-AGR3 t(12;14)(q14;q11) HMGA2-CCNB1IP1 t(7;12)(q31;q14) HMGA2-COG5 t(8;12)(q22;q14) HMGA2-COX6C t(12;14)(q14;q24) HMGA2-RAD51L1 Leiomyosarcoma SIRT1 Lipoblastoma t(7;8)(q21;q12) COL1A2-PLAG1 t(2;8)(q31;q12.1) COL3A1-PLAG1 del(8)(q12q24) HAS2-PLAG1 Lipoma t(5;12)(q33;q14) EBF1-LOC t(2;12)9)(q37;q14) HMGA2-CXCR7 t(5;12)(q33;q14) HMGA2-EBF1 t(12;13)(q14;q13) HMGA2-LHFP t(3;12)(q28;q14 HMGA2-LPP t(9;12)(p22;q14) HMGA2-NFIB t(1;12)(p32;q14) HMGA2-PPAP2B t(3;12)(q28;q14) LPP-C12orf9 Mesenchymal chondrosarcoma t(8;8)(q12;q21) HEY1-NCOA2 t(1;5)(q42;q32) IRFBP2-CDX1 Myoepithelioma t(12;22)(q13;q12) EWSR1-ATF1 t(1;22)(q23;q12) EWSR1-PBX1 t(6;22)(p21;q12) EWSR1-POU5F1 t(19;22)(q13;q12) EWSR1-ZNF

306 Translocation Genes Myxoid chondrosarcoma t(9;17)(q31;q12) TAF15-NR4A3 t(3;9)(q12;q31) TFG-NR4A3 t(9;15)(q31;q21) TCF12-NR4A3 t(9;22)(q22-31;q11-12) EWSR1 NR4A3 Myxoid liposarcoma t(12;16)(q13;p11) FUS DDIT3 t(12;22)(q13;q12) EWSR1 DDIT3 Ossifying fibromyxoid tumour t(6;12)(p21;q24) EP400-PHF1 PEComa t(x;1)(p11;p34) SFPG-TFE3 t(14;x)(q24;q12) RAD51B-OPHN1 t(14;x)(q24;p11) RAD51B-RRAGB Pericytoma t(7;12)(p22;q13) ACTB-GLI1 Primary pulmonary myxoid sarcoma t(2;22)(q33;q12) EWSR1-CREB1 Sclerosing epithelioid fibrosarcoma t(7;16)(q34;p11) FUS-CREB3L2 t(11;22)(p11;q12) EWSR1-CREB3L1 t(7;22)(q3;q12) EWSR1-CREB3L2 Soft tissue angiofibroma t(5;8)(p15;q13) AHRR-NCOA2 t(7;8;14)(q11;q13;q31) GTF2I-NCOA2 Soft tissue chondroma t(3;12)(q28;q14) HMGA2-LPP Solitary fibrous tumour inv(12)(q13q13) NAB2-STAT6 275

307 Translocation Spindle cell rhabdomyosarcoma t(6;8)(p21;q13) t(8;11)(q13;p15) t(6;6)(q22;q24) t(6;8)(q22;q13) Synovial sarcoma t(x;18)(p11;q11) Tenosynovial giant cell tumour t(1;2)(p13;q37) Undifferentiated sarcomas inv(x)(p11p11) t(4;19)(q35;q13) t(10;19)(q26;q13) t(6;22)(p21;q12) t(2;22)(q31;q12) Genes SRF-NCOA2 TEAD1-NCOA2 VGLL2-CITED2 VGLL2-NCOA2 SS18-SSX1, SS18-SSX2, SS18-SSX4 COL6A3-CSF1 BCOR-CCNB3 CIC-DUX4 CIC-DUX4L10 EWSR1-POU5F1 EWSR1-SP3 106, 107, Citations: 276

308 Appendix E Genetically complex sarcomas 277

309 278 Sarcoma Genes References Angiosarcoma Amplification: 8q24.21 (MYC), 10p12.33, 5q Chondrosarcoma (types other than extraskeletal myxoid) COL2A1, IDH1, IDH2, TP53, RB1 pathway 726 Embryonal Polysomy: 8, 2, 11, 12, 13 and 20. Monsomy: 10 rhabdomyosarcoma and 15. LOH: 11p15.5 (IGF2, H19, CDKN1C, HOTS). Gains: 12q13 Fibrosarcoma (other than Multiple non-specific numerical and structural congenital) chromosomal abnormalities. Gain: 22q (PDGF-B) Leiomyosarcoma Gains: 1, 5, 6, 8, 15, 16, 17, 19, 20, 22, X. Losses: 1p, 2, 3, 4, 6q, 8, 9, 10p, 11p, 12q, 11q, 13, 16, 17p, 18 19, 22q. Amplifications: 1, 5, 8, 12, 13, 17, 19, 20 Liposarcoma (types other Gains of 1p, 1q21-q32, 2q, 3p, 3q, 5p12-p15, 5q, than myxoid) 6p21, 7p, 7q22, 8q, 10q, 12q12-q24, 13q, 14q, 15q, 17p, 17q, 18p, 18q12, 19p12, 19q13, 20q, 22q, and Xq21-q27. Losses: 1q, 2q, 3p, 4q, 10q, 11q, 12p13, 13q14, 13q21-qter, 14q23-24, 16q22, 17p13, 17q11.2, and 22q ,

310 Sarcoma Genes References Malignant peripheral Gains: 7p21-q36, 7p22, 7q, 8, 8q11-23, 1q25-44, nerve-sheath tumour and 5q Losses: 1p12-13, 1p21, 1p36, 3p21-pter, 9p13-21, 9p22-24, 10, 10p11-15, 11p, 11q21-25, 13q14, 15p, 16/16q24, 17/17p, 17q11-12, 17q21-25, 22, 22p, 22q13, and 22q Ring chromosomes, trisomy 7, and rearrangements of 11p and 12q Breakpoints: 1p, 7p22 (ETV1 ), 11q13-23, 20q13 (SRC ), and 22q11-13 (NF2 ) Myxofibrosarcoma Gains: 19p, 19q. Losses: 1q, 2q, 3p, 4q, 10q, 11q, and 13q (RB1 ). Amplification: 1, 5p, and 20q Extraskeletal Gains: 1q, 2, 8, and 17p11. Losses: 1q, 2, 5, 6, Osteosarcoma 12, 13, 14, 15, 16, 18, 19, 20, 21, and Y Pleomorphic Gains: 1p22-23, 5, 7p, 8, 14, 18/18, 20p, and 22. rhabdomyosarcoma Losses: 2, 3p, 5q32-qter, 6,10q23 (PTEN ), 11, 13, 14, 15q21-q22, 16, 17, 18, 19, and Y , 738, ,

311 280 Sarcoma Genes References Spindle cell/pleomorphic Gains: 1p36-p31, 1q21-q24, 2p, 4p16, 5p, unclassified sarcoma 5q34, 6q, 7p15-p22, 7q21-qter, 17q, 9q, 14q, 16p13, 17q, 19p13, 19q13.11-q13.2, 20q, and 21q. Losses: 1q32.1, 2p25.3, 2q36-q37, 8p23, 9p, 10q21-q23, 11q22, 13q14-q21, 16q11, and 16q23. Amplifications: 1p33-p34, 12q13-q15, 17cen-p11.2, and 17p13-pter Skeletal osteosarcoma Gains and regional amplifications: 1q, 6p21-p12, 8q23-q24, and 17p13-p11.2 (TP53 ). Partial or complete loss: 6q. Rearrangements of chromosomes

312 Appendix F Known cancer predisposition genes Gene Genomic Cancer predisposition location SDHB 1p36.13 Gastrointestinal stromal tumour, paraganglioma, gastric stromal sarcoma, pheochromocytoma MUTYH 1p34.1 Colorectal adenomas, colorectal adenomatous polyposis, gastric cancer (somatic) UROD 1p34.1 Hepatocellular carcinoma MPL 1p34 Familial essential thrombocythemia GBA 1q22 Gaucher disease SDHC 1q23.3 Gastrointestinal stromal tumour, paraganglioma and gastric stromal sarcoma CDC73 1q31.2 Parathyroid carcinoma and adenoma FH 1q43 Leiomyomatosis and renal cell cancer ALK 2p23.2-p23.1 Familial neuroblastoma SOS1 2p22.1 Noonan syndrome 281

313 Gene Genomic Cancer predisposition location MSH2 2p21-p16 Colorectal cancer, hereditary nonpolyposis, type 1 MSH6 2p16.3 Colorectal cancer, hereditary nonpolyposis type 5, endometrial cancer (familial), mismatch repair cancer syndrome TMEM127 2q11.2 Pheochromocytoma ERCC3 2q14.3 Xeroderma pigmentosum, group B ABCB11 2q31.1 Hepatocellular carcinoma DIS3L2 2q37.1 Perlman syndrome VHL 3p25.3 Hemangioblastoma, pheochromocytoma, renal cell carcinoma, von Hippel-Lindau syndrome XPC 3p25.1 Xeroderma pigmentosum, group C BAP1 3p21.1 Tumour predisposition syndrome COL7A1 3p21.31 Dystrophic epidermolysis bullosa MLH1 3p22.2 Colorectal cancer, hereditary nonpolyposis type 2, mismatch repair cancer syndrome ATR 3q23 Familial cutaneous telangiectasia and cancer syndrome GATA2 3q21.3 Acute myeloid leukemia, myelodysplastic syndrome PHOX2B 4p13 Neuroblastoma KIT 4q12 Gastrointestinal stromal tumour, germ cell tumours, acute myeloid leukemia PDGFRA 4q12 Gastrointestinal stromal tumour SDHA 5p15.33 Paragangliomas TERT 5p15.33 Acute myeloid leukemia, melanoma 282

314 Gene Genomic Cancer predisposition location APC 5q22.2 Adenomatous polyposis coli, Brain tumour-polyposis syndrome 2, Colorectal cancer (somatic), Gardner syndrome, gastric cancer (somatic), hepatoblastoma (somatic) ITK 5q33.3 Lymphoproliferative syndrome 1 HFE 6p22.2 Hemochromatosis FANCE 6p21-p22 Acute myeloid leukaemia POLH 6p21.1 Xeroderma pigmentosum, variant type PMS2 7p22.1 Colorectal cancer, hereditary nonpolyposis type 4, mismatch repair cancer syndrome EGFR 7p11.2 Adenocarcinoma of lung, non-small cell lung cancer SBDS 7q11.21 Shwachman-Diamond syndrome SLC25A13 7q21.3 Hepatocellular carcinoma MET 7q31.2 Hepatocellular carcinoma, renal cell carcinoma, osteofibrous dysplasia PRSS1 7q34 Pancreatic cancer WRN 8p12 Werner syndrome NBN 8q21.3 Acute lymphoblastic leukemia, Nijmegen breakage syndrome EXT1 8q24.11 Chondrosarcoma RECQL4 8q24.3 Rothmund-Thomson syndrome DOCK8 9p24.3 Hyper-IgE recurrent infection syndrome, autosomal recessive MTAP 9p21.3 Malignant fibrous histiocytoma CDKN2A 9p21.3 Melanoma, neural system tumour syndrome, orolaryngeal cancer, pancreatic cancer/melanoma syndrome 283

315 Gene Genomic Cancer predisposition location RMRP 9p13.3 Metaphyseal dysplasia without hypotrichosis FANCG 9p13 Acute myeloid leukaemia XPA 9q22.33 Xeroderma pigmentosum, group A FANCC 9q22.32 Fanconi anemia, complementation group C PTCH1 9q22.32 Basal cell carcinoma TGFBR1 9q22.33 Multiple self-healing squamous epithelioma TSC1 9q34.13 Lymphangioleiomyomatosis, tuberous sclerosis-1 RET 10q11.21 Medullary thyroid carcinoma, multiple endocrine neoplasia, pheochromocytoma BMPR1A 10q23.2 Polyposis syndrome PTEN 10q23.31 Cowden syndrome 1, Endometrial carcinoma, malignant melanoma, PTEN hamartoma tumour syndrome, squamous cell carcinoma, head and neck, glioma susceptibility, prostate cancer TNFRSF6 10q23.31 Autoimmune lymphoproliferative syndrome, squamous cell carcinoma, autoimmune lymphoproliferative syndrome SUFU 10q24.32 Medulloblastoma HRAS 11p15.5 Costello syndrome, bladder cancer, thyroid carcinoma FANCF 11p15 Acute myeloid leukaemia WT1 11p13 Mesothelioma, Wilms tumor, type 1 DDB2 11p11.2 Xeroderma pigmentosum, group E, DDB-negative subtype 284

316 Gene Genomic Cancer predisposition location EXT2 11p11.2 Exostoses, multiple, type 2 SDHAF2 11q12.2 Familial paraganglioma MEN1 11q13.1 Adrenal adenoma, angiofibroma, carcinoid tumour of lung, lipoma, multiple endocrine neoplasia 1, parathyroid adenoma ATM 11q22.3 Lymphoma, T-cell prolymphocytic leukemia, breast cancer CBL 11q23.3 Noonan syndrome-like disorder SDHD 11q23.1 Intestinal carcinoid tumours, Cowden syndrome, merkel cell carcinoma, paraganglioma, gastric stromal sarcoma and pheochromocytoma HMBS 11q23.3 Hepatocellular carcinoma CDKN1B 12p13.1 Multiple endocrine neoplasia, type IV CDK4 12q14.1 Melanoma PTPN11 12q24.13 Juvenile myelomonocytic leukemia, Noonan syndrome 1 HNF1A 12q24.2 Familial hepatic adenoma POLE 12q24.33 Colorectal cancer BRCA2 13q13.1 Fanconi anemia, complementation group D1, Wilms tumour, breast cancer (male), breast-ovarian cancer, glioblastoma, medulloblastoma, pancreatic cancer, prostate cancer GJB2 13q12.11 Vohwinkel syndrome RB1 13q14.2 Bladder cancer, osteosarcoma, retinoblastoma, small cell cancer of the lung ERCC5 13q33.1 Xeroderma pigmentosum, group G MAX 14q23.3 Pheochromocytoma 285

317 Gene Genomic Cancer predisposition location SERPINA1 14q32.13 Thyroid cancer DICER1 14q32.13 Pleuropulmonary blastoma, rhabdomyosarcoma BUB1B 15q15.1 Colorectal cancer FAH 15q25.1 Hepatocellular carcinoma BLM 15q26.1 Bloom Syndrome TSC2 16p13.3 Lymphangioleiomyomatosis, tuberous sclerosis-2 ERCC4 16p13.12 Fanconi anemia, complementation group Q, Xeroderma pigmentosum, group F, Cockayne syndrome PALB2 16p12.2 Fanconi anemia, complementation group N, breast cancer, pancreatic cancer CYLD 16q12.1 Brooke-Spiegler syndrome, cylindromatosis, trichoepithelioma CDH1 16q22.1 Endometrial carcinoma, gastric cancer, ovarian carcinoma, breast cancer, prostate cancer FANCA 16q24.3 Fanconi anemia, complementation group A TP53 17p13.1 Adrenal cortical carcinoma, breast cancer, choroid plexus papilloma, colorectal cancer, hepatocellular carcinoma, Li-Fraumeni syndrome, nasopharyngeal carcinoma, osteosarcoma, pancreatic cancer, basal cell carcinoma 7, glioma susceptibility FLCN 17p11.2 Colorectal cancer, renal carcinoma RAD51D 17q12 Familial breast-ovarian cancer NF1 17q11.2 Neurofibromatosis, type 1 286

318 Gene Genomic Cancer predisposition location BRCA1 17q21.31 Familial breast-ovarian cancer, pancreatic cancer STAT3 17q21.2 Autoimmune disease, Hyper-IgE recurrent infection syndrome SMARCE1 17q21.2 Familial meningioma TRIM37 17q22 Breast cancer BRIP1 17q23.2 Breast cancer, Fanconi anemia, complementation group J PRKAR1A 17q24.2 Adrenocortical tumour, Carney complex type 1, pigmented nodular adrenocortical disease, primary AXIN2 17q24.1 Colorectal cancer, oligodontia-colorectal cancer syndrome RAD51C 17q22 Fanconi anemia, complementation group O, familial breast-ovarian cancer RHBDF2 17q25.1 Tylosis with esophageal cancer SMAD4 18q21.2 Juvenile polyposis/hereditary hemorrhagic telangiectasia syndrome, pancreatic cancer (somatic), juvenile intestinal polyposis SETBP1 18q21.1 Schinzel-Giedion syndrome ELANE 19p13.3 Cyclic neutropenia, severe congenital neutropenia STK11 19p13.3 Melanoma, pancreatic cancer, Peutz-Jeghers syndrome, testicular cancer SMARCA4 19p13.2 Coffin-Siris syndrome 4, rhabdoid tumour predisposition syndrome 2 CEBPA 19q13.11 Acute myeloid leukaemia 287

319 Gene Genomic Cancer predisposition location ERCC2 19q13.32 Cerebrooculofacioskeletal syndrome 2, trichothiodystrophy 1 photosensitive, xeroderma pigmentosum group D POLD1 19q13.33 Colorectal cancer RUNX1 21q22.12 Acute myeloid leukaemia, platelet disorder, familial, with associated myeloid malignancy SMARCB1 22q11.23 Coffin-Siris syndrome, Rhabdoid tumours, Schwannomatosis-1 LZTR1 22q11.21 Schwannomatosis-2 CHEK2 22q12.1 Li-Fraumeni syndrome, osteosarcoma, breast cancer, colorectal cancer, prostate cancer NF2 22q12.2 Neurofibromatosis type 2, schwannomatosis, meningioma WAS Xp11.23 Neutropenia, thrombocytopenia, Wiskott-Aldrich syndrome SH2D1A Xq25 Lymphoproliferative syndrome GPC3 Xq26.2 Wilms tumour DKC1 Xq28 Dyskeratosis congenita SRY Yp11.2 Hepatocellular carcinoma The information in this table was sourced from the Online Inheritance in Man database and 134, the Catalogue of Somatic Mutations in Cancer database. 288

320 Appendix G Candidate genes used for variant prioritisation based on a priori knowledge of cancer biology Gene name Chromosome Start End No. variants APC ARID1A ATM ATR AXIN AXIN BARD BLM BRCA BRCA BRIP BUB1B C17orf CD99 Y CDH CDKN2A CHEK CHEK DDB DICER

321 Gene name Chromosome Start End No. variants DKC1 X DNA ELF ELF ERCC ERCC ERCC ERCC ERF ERG ETS ETS ETV ETV ETV ETV EWSR EXT EXT FAM175A FANCA FANCB X FANCC FANCD FANCE FANCF FANCG FANCI FANCL FANCM FH FLI HNF4A IDH IDH KIF1B KIT LIG LIG MDM MEN

322 Gene name Chromosome Start End No. variants MET MLH MLH MRE11A MSH MSH MSH MUTYH NBN NEIL NF NF PALB PMS PMS POLH PPARG PRKAR1A PTCH PTEN PTPN RAD RAD51C RAD51D RB RECQL RET RMI RMI RPA RPA RPS SDHA SDHB SDHC SDHD SMARCA SMARCB SPDEF SPI SQSTM

323 Gene name Chromosome Start End No. variants STK TAF TGFBR TNFRSF11A TOP TOP3A TP TP53BP TSC TSC VHL WRN WT XPA XPC XRCC Start and End: the chromosome locations of the start and end of the gene (including ± 25 kb). 292

324 Appendix H Genes in which variants were also prioritised using the candidate gene prioritisation strategy Gene Chromosome No. variants ACCS 11 8 ACP ACRV ACYP AIMP2 7 3 ALX ANKRD ARHGAP ARHGDIG 16 2 ARHGEF ATP13A2 1 4 ATP1B ATP5D 19 3 BRK1 3 3 C11orf C11orf C19orf C22orf C5orf C9orf

325 Gene Chromosome No. variants CAMKK CAT 11 2 CCDC CDC42BPG CFAP CHCHD COX6B CPM 12 4 DCTN DERL DHFR 5 3 DHX DLAT 11 1 DMRTC EIF2AK1 7 1 EIF2B EPCAM 2 2 EPM2AIP1 3 1 EVI2A 17 2 FAM20A 17 3 FAM20A, PRKAR1A 17 1 FBXO FDFT1 8 4 FLII 17 7 FNDC FRY 13 1 GAS2L GATA4 8 3 GPT 8 2 GSK3A 19 3 GTPBP2 6 2 HAUS HEATR HELQ 4 11 HNRNPK 9 4 HSCB 22 2 IL INTS IRAK2 3 4 ITFG ITGAE

326 Gene Chromosome No. variants KLC LOC LOC LPAR LRRC LRRC14B 5 2 LRRFIP2 3 2 LYPD MAD2L1BP 6 1 MAP3K2 2 1 MAP4K MFSD3 8 1 MIDN 19 3 MIEF MIS18BP MMP MPZ 1 1 MRPL MRPS18C 4 1 MYBPC NDUFAB NPAT 11 1 NR1H ORMDL1 2 3 OSGIN2 8 2 PACSIN1 6 1 PADI2 1 9 PDIA PGD 1 3 PIGO 9 10 PIGV 1 2 PIH1D PKD PLA2G4C PLCG1-AS POLG PPP1R16A 8 2 R3HDML 20 6 RBM RCBTB RGS

327 Gene Chromosome No. variants RHBDD RNPEP 1 5 RNU6-28P 15 5 RPL10A 6 1 RUFY SHMT SLC2A SLC9A3R SMCR SMYD SRP STOML2 9 3 STT3A 11 3 TANGO TEAD3 6 4 TESK2 1 3 TMEM TMEM8A TOE1 1 1 UPK1A 19 3 VAT VCP 9 5 VPS9D VRK2 2 1 WRAP XPO5 6 2 XRN1 3 1 ZAR1L 13 4 ZC2HC1C 14 2 ZNF ZNF ZNF

328 Appendix I Patient 1-II-2: Copy number variation by chromosome 297

329 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

330 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

331 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

332 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

333

334 Appendix J Patient 2-II-1: Copy number variation by chromosome 303

335 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

336 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

337 Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Log 2 (T/R) Index Index Index Index Index Index Black: normalised log ratios. Red: mean values among points in segment obtained by circular binary segmentation.

Tumor suppressor genes D R. S H O S S E I N I - A S L

Tumor suppressor genes D R. S H O S S E I N I - A S L Tumor suppressor genes 1 D R. S H O S S E I N I - A S L What is a Tumor Suppressor Gene? 2 A tumor suppressor gene is a type of cancer gene that is created by loss-of function mutations. In contrast to