Interrogation of rare functional variation within bipolar disorder and suicidal behavior cohorts

Size: px
Start display at page:

Download "Interrogation of rare functional variation within bipolar disorder and suicidal behavior cohorts"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations Spring 2018 Interrogation of rare functional variation within bipolar disorder and suicidal behavior cohorts Eric Thayne Monson University of Iowa Copyright 2018 Eric Thayne Monson This dissertation is available at Iowa Research Online: Recommended Citation Monson, Eric Thayne. "Interrogation of rare functional variation within bipolar disorder and suicidal behavior cohorts." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Genetics Commons

2 INTERROGATION OF RARE FUNCTIONAL VARIATION WITHIN BIPOLAR DISORDER AND SUICIDAL BEHAVIOR COHORTS by Eric Thayne Monson A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Genetics in the Graduate College of The University of Iowa May 2018 Thesis Supervisor: Associate Professor Virginia L. Willour

3 Copyright by ERIC THAYNE MONSON 2018 All Rights Reserved

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL This is to certify that the Ph.D. thesis of PH.D. THESIS Eric Thayne Monson has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Genetics at the May 2018 graduation. Thesis Committee: Virginia L. Willour, Thesis Supervisor Patrick Breheny Thomas L. Casavant Shizhong Han Thomas H. Wassink

5 To my family ii

6 ACKNOWLEDGEMENTS I want to start by offering my heartfelt thanks to my mentor, Virginia Willour, for all of the patient guidance and instruction throughout my training. She helped me learn the importance of maintaining the purpose of our work at the forefront of our minds; no matter how many times we might need to re-read a draft or re-analyze a result, we do it right and with renewed focus because the work matters. I also want to thank my thesis committee for always holding me to a high standard and for their thoughtful insights and critical assessments of my work that has allowed this project to evolve. I want to thank my colleagues and lab-mates, Marie Gaine and Sophie Gaynor, not only for their incredible effort and exceptional contributions to these projects but also for putting up with my conversations with my computer, constantly talking with me through ideas, and periodically leaving something sugary in the breakroom to keep us all going. I also could never have succeeded in this endeavor without the incredible support of my wife and best friend, Alison. Her endless willingness to read my manuscripts, endure my anxieties, serve as a key sounding board for my ideas, and ensuring that I occasionally ate and slept during the many long days and nights of troubleshooting programs and revisions made this work entirely possible. She also kept our tiny humans alive throughout this endeavor for which she deserves explicit credit. I want to thank my children (the tiny humans) for their patience and support through this process. There were many days that daddy had to get revisions done rather than playing outside, and numerous evenings that they had to endure practice presentations and out-loud readings of manuscripts for their bed-time stories. iii

7 Finally, I wish to thank my family: my sister, Michelle, for commiserating with me throughout the writing of our dissertations, my parents, Lanny and Kathryn, for always showing excitement and interest for my work, and my brothers, Michael and Nathan, as we shared in the joys of coding and hunting for errors. Finally, I wish to extend a special acknowledgement to my sister, Angela, who will never get to see this work but who served as an instrumental source of inspiration in the effort to keep pursuing answers. iv

8 ABSTRACT Suicidal behavior represents the most severe, yet inherently preventable, outcome of psychiatric disease. Despite tremendous efforts to improve the awareness and treatment of psychiatric illness, suicidal behavior rates have been on the rise. The greatest challenge to confronting this crisis is the effective identification and treatment of those at risk for suicide. This challenge has been difficult to address due, in part, to the lack of a clear biological basis for suicidal behavior. Toward addressing this knowledge gap, evidence has been identified of a significant heritable component to suicidal behavior. Subsequent genetic research efforts have focused on the examination of common sites of genetic variation within candidate genes and throughout the genome. These efforts have identified many potentially important risk loci, but the majority of the risk expected to arise from genetic variation remains unexplained by current data. The primary objective of this dissertation was to examine the contribution of largely unexplored rare and potentially damaging genetic variation within suicidal behavior. To do this, targeted next-generation sequencing approaches were employed within a cohort of individuals diagnosed with bipolar disorder, a group particularly enriched for suicidal behavior. Sequence data was generated that examined essentially all protein-coding regions of the human genome ( exome ), with expanded sequencing around and within candidate genes hypothesized to play a role in suicidal behavior risk. The secondary objective of this dissertation focused on the assessment of rare variation within bipolar disorder through sequenced pedigrees and followup in a large collaborative bipolar disorder versus normal control sequencing dataset. v

9 These objectives were addressed through the thoughtful application of diverse and complimentary methods. These methods were selected to investigate individual variants, genes, and biological pathways. This approach offered examinations of the potential impact of rare genetic variation within focused regions and across complex biological process pathways that could be disrupted through damaging variation in many different genes. The presented efforts represent the largest examinations of rare functional variation with suicidal behavior and bipolar disorder performed, to date. No individual variant or gene survived correction for multiple testing for either phenotype. These results are consistent with other initial sequencing efforts in complex psychiatric phenotypes, offering conclusions that larger samples will likely be required to identify significant associations for single variants and genes. Within pathway analyses, however, we identified a significant enrichment of rare damaging variation that segregated within bipolar disorder pedigrees in genes that have been implicated in de novo studies of autism. This finding was further replicated within three large case/control sequencing samples, providing support to emerging evidence of a potential overlap of risk loci for autism and bipolar disorder. Many additional results approached significance that bear further consideration. These results offer potential candidate genes and pathways that could be utilized in future sequencing efforts for suicidal behavior and bipolar disorder. In addition, highly valuable resources in the form of datasets strongly enriched for novel rare loci were produced that can significantly contribute to ongoing efforts to investigate bipolar disorder and suicidal behavior. These data can be used in combination with other emerging datasets to generate more powerful meta- and mega-analyses to confidently identify risk loci for both phenotypes. vi

10 PUBLIC ABSTRACT Despite being inherently preventable, suicidal behavior has been difficult to address for at least two major reasons: difficulty in identifying which individuals are at greatest risk, and few known treatments that reliably decrease the risk of death from suicide. Both of these problems stem from a poor understanding of the biology behind suicidal behavior risk. There is strong evidence that genetics significantly contribute to this risk and current genetic studies have identified potential risk genes. However, we believe that only complete genetic sequence analysis may identify genetic variation and mutations increasing the risk for suicidal behavior. This dissertation is focused on searching for these rare potential risk variations through sequencing explorations within people diagnosed with bipolar disorder who have previously attempted suicide compared with those that have not. Rare genetic differences in individuals diagnosed with bipolar disorder versus individuals with no psychiatric disease are also explored. Our results do not confidently identify any specific genes or genetic variants as being strongly associated with either suicidal behavior or bipolar disorder. However, several promising results that merit additional investigation were identified. These results may aid in the identification of key genetic risk factors in followup genetic studies. vii

11 TABLE OF CONTENTS List of Tables... xiii List of Figures... xiv Chapter 1, Introduction... 1 Overview... 1 Background Psychiatric Genetics Research... 2 Background The Basis of Bipolar Disorder... 3 Background The Basis of Suicidal Behavior... 4 RareBLISS Exome Project... 6 Suicidal Behavior Targeted Sequencing Project... 8 Figures Chapter 2, complete published manuscript, Exome Sequencing of Familial Bipolar Disorder Study Contribution Copyright Statement Abstract Importance Objective Design, Setting, and Participants Main Outcomes and Measures Results Conclusions and Relevance Introduction Key Points Question Findings Meaning Methods Primary Family and Case-Control Samples Exome Sequencing Family Analysis Case-Control Analysis Meta-Analysis Gene Set Enrichment Analysis viii

12 Results Family Analysis Case-Control Follow-Up Variant-Level Analysis Gene-Burden Analysis Gene Set Enrichment Discussion Conclusions Conflict of Interest Disclosures Funding Support Role of the Funder/Sponsor Additional Contributions Figures Tables Supplemental Information Chapter 3, complete published manuscript, Assessment of Whole-Exome Sequence Data in Attempted Suicide within a Bipolar Disorder Cohort Study Contribution Abstract Introduction Materials and Methods Sample Collection Sequencing and Data Preparation Statistical Analyses Results Individual Variant Test Results Gene Level Tests Pathway Analyses Discussion Pathway Exploration The Contribution of High-Throughput Investigation Efforts in Suicide Study Limitations Conclusion Acknowledgements Statement of Ethics Disclosure Statement ix

13 Figures Tables Supplemental Information Chapter 4, complete published manuscript, Whole-Gene Sequencing Investigation of SAT1 in Attempted Suicide Study Contribution Abstract Introduction Materials and Methods Study Subjects Sequencing Data Preparation Statistical Analysis Power Analysis Results Single Variant Testing Previously Associated Variants Region-Based And Haplotype Assessments Discussion Study Population and Size Linkage Disequilibrium of the Region Other Explanations for Observed Expression Changes Study Limitations Conclusion Acknowledgements Conflicts of Interest Figures Tables Supplemental Information Chapter 5, complete published manuscript, A Targeted Sequencing Study of Glutamatergic Candidate Genes in Suicide Attempters with Bipolar Disorder Study Contribution Abstract Introduction Materials and Methods x

14 Sample Collection Target Determination Next Generation Sequencing Statistical Analyses Results Discussion Acknowledgments Conflicts of Interest Figures Tables Supplemental Information Chapter 6, complete published manuscript, Targeted Sequencing of FKBP5 in Suicide Attempters with Bipolar Disorder Study Contribution Abstract Introduction Materials and Methods Ethics Statement Subjects Definition of Suicide Subjects Population Stratification Target Determination SureSelect Technology Sample Processing Pipeline Statistical Analyses Genotyping Haplotype Analysis Results Dataset Quality Single-Variant Analysis Replication of Results Replication of Prior Findings Sex-Specific Analyses Gene-Level Analysis Haplotype Analysis Discussion Funding xi

15 Competing Interests Figures Tables Supplemental Information Chapter 7, Conclusion Summary Limitations Future Directions Conclusion Figures Appendix A: Workflow for the RareBLISS Exome-Sequencing Project Appendix B: Workflow for the SB Targeted Candidate Gene Sequencing Project References xii

16 LIST OF TABLES Table 2.1: Overview of Segregating Variants per Family by Allele Frequency and Functional Annotation Strata Table 2.2: Variant-Level Meta-analysis Showing All Variants with a Metaanalytic Association P < Table 2.3: Gene-Based Meta-analysis Showing All Genes with a Metaanalytic Association P < Table 3.1: Gene Signals for Coding and Noncoding Analyses Table 3.2: Pathway Analysis Results Table 4.1: Covered Variants Previously Associated with Suicidal Behavior and/or SAT1 Expression Table 4.2: Gene-Burden Results Table 5.1: Sample Set Demographics Table 5.2: Top Individual-Variant Results Table 5.3: Gene-Level Results (P < 0.05) Table 6.1: Single-Variant Results with a P-Value < Table 6.2: Gene-Level Results Using Two Minor Allele Thresholds Table 6.3: Sex-Specific Gene-Level Results Using Two Minor Allele Thresholds Table 6.4: Haplotype Results Generated using Haploview xiii

17 LIST OF FIGURES Figure 1.1: Heritabilities of Five Major Psychiatric Phenotypes Figure 1.2: Significant Loci Identified per Cases Assessed in Psychiatric GWAS Figure 1.3: Distribution of Variants in Various Disease Risk Hypotheses Figure 2.1: Pedigrees of the 8 Families With Bipolar Disorder (BD) Selected for Sequencing Figure 3.1: Manhattan Plot of Results for Association Tests with All Variants in the Dataset Figure 3.2: Representation of the Shared Genes in 3 of the Top Pathway Association Results that Demonstrated Strong Overlap of Included Genes Figure 4.1: Representation of the SAT1 Locus Variants Figure 4.2: Representation of the LD Structure of the Sequenced Region of Variants Figure 5.1: Effect of Drug Treatments on Glutamatergic Signaling Figure 6.1: Schematic of the Top Variant in the FKBP5 Gene alongside Previous Findings Figure 6.2: Haplotype Block Structures in the FKBP5 Region Figure 7.1: Overlap of Top Gene Results from SB and BD RareBLISS Data Figure A.1: Data Preparation Workflow and Contributions Figure A.2: Bipolar Disorder Exome Project Statistical Analyses and Contributions Figure A.3: Suicide Exome Project Statistical Analyses and Contributions Figure B.1: Data Preparation Workflow and Contributions Figure B.2: SB Candidate Gene Project Statistical Analyses and Contributions xiv

18 OVERVIEW CHAPTER 1, INTRODUCTION This dissertation presents a comprehensive examination of whole-exome and targeted sequencing datasets. The exome-sequencing effort was generated from a large cohort of individuals diagnosed with bipolar disorder (BD) to create the Rare Bipolar Loci Identification through Synaptome Sequencing (RareBLISS) dataset. This was accomplished through a major collaborative effort across three institutions with responsibilities distributed as demonstrated within Appendix A. We analyzed this dataset in two ways. First, we assessed BD pedigrees with the goal of identifying high penetrance variants segregating with the BD phenotype. Second, we assessed BD individuals with a history of suicidal behavior (SB), defined for this project as at least one past suicide attempt. BD suicide attempters were compared with BD non-attempters with the goal of identifying loci specific to the risk for suicidal behavior. In total, the RareBLISS projects generated one first author (SB project) and one co-author (BD project) manuscript, presented in this dissertation as chapters 2 and 3. In addition to the exome studies, we also completed three suicidal behavior targeted sequencing projects focused on capturing additional noncoding regulatory sequence data from promising SB candidate genes. The SB targeted sequencing projects were generated and assessed within the Willour lab using the collaborative exome-sequencing project as a template. The targeted sequencing project generated two co-author manuscripts and one first author manuscript, presented as chapters 4-6 of this dissertation. Combined, the SB exome and targeted sequencing projects represented approximately 85% of my time commitment as a graduate student, and the BD exome project represented approximately 15% of my time. 1

19 Finally, chapter 7 offers a critical evaluation of the cumulative contribution these efforts made to the field, rationales for our study design choices, and the limitations imposed by these designs. BACKGROUND PSYCHIATRIC GENETICS RESEARCH Investigations into the basis of complex psychiatric phenotypes have frequently shown evidence of genetic variation playing an important role. This is demonstrated through a range of predicted heritabilities, with several examples presented in Figure 1.1. This evidence has driven many investigation efforts to understand source of this genetic risk. In general, these investigations have followed a consistent pattern. The majority of initial studies are focused on specific candidate genes predicted to be of importance to a given psychiatric phenotype. These candidate genes have typically been selected based on known therapeutic targets and other hypotheses and have been the source of a number of associations with psychiatric disease. As efforts have continued and the lists of potential candidate genes have grown, the application of additional techniques and technologies for genome-wide investigations have been increasingly employed. These genomewide methods have primarily focused on common sites of variation under the premise of the common disease, common variant (CDCV) hypothesis (1). This hypothesis suggests common variants drive disease risk and should be detectable via assessing variation representing the majority of regions in linkage disequilibrium throughout a candidate gene or within the complete genome (2). The largest efforts following this hypothesis focused on the application of genome wide association studies (GWAS) to thousands of samples, a method that has shown varying levels of success depending on the phenotype investigated (Figure 1.2). 2

20 Estimates from these large GWAS efforts have consistently demonstrated that common variation explains only a part of the predicted risk expected from the heritability estimates for each phenotype (3). This missing heritability has led to adoption of an additional hypothesis, the common disease, rare variant hypothesis (CDRV). This hypothesis suggests that genetic risk for a common disease may arise from a broad diversity of rare, but comparatively large effectsize mutations (3). Therefore, there may be sites of rare variation that would not be well interrogated by existing common variant analyses that significantly contribute to the risk for the assessed phenotypes. These rare sites can only be appropriately interrogated through sequencing (Figure 1.3), a method that has only recently become economical to perform on more than a small region at a time, and which forms the basis of the efforts described within this dissertation. BACKGROUND THE BASIS OF BIPOLAR DISORDER BD is a severe psychiatric disorder that is characterized clinically by the presentation of extremes of mood and behavior including severe manic, hypomanic, and/or depressive episodes as defined within the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (4). BD is also quite common, with an estimated lifetime prevalence of 1.0% (type I bipolar disorder) and 1.1% (type II bipolar disorder) within the United States (5). There are a number of available treatments for BD, but these treatments typically vary widely in efficacy from patient to patient and often have difficult adverse effect profiles with limited therapeutic windows making treatment compliance and safety an issue (6). These inconsistencies arise from a general lack of understanding of the biological basis of BD (6), a fact that is underscored by decades of research that have not yet identified the therapeutic mechanisms of long-standing treatments such as lithium (7). A clear understanding of the 3

21 biological basis of BD could help direct future treatment development that is better targeted, and could significantly improve treatment efficacy, compliance, and the quality of life for affected patients. Efforts to understand the biological basis of BD have provided evidence of a particularly strong contribution of genetic variation to disease risk with an estimated heritability of 85% (8). Such evidence suggests that BD should be particularly amenable to genetic assessment. This observation has led to a wide diversity of projects that have identified regions of potential importance to bipolar disorder. Many recent efforts in BD have focused particularly on common variation within large GWAS (9-13). These efforts have now produced several consistently replicated risk loci for BD (14), but these data explain only a small proportion of the estimated heritability for BD (13, 15). Such results have prompted several initial studies to investigate the contribution of the CDRV hypothesis to BD risk. These initial BD rare variation exploration efforts have focused primarily on family-based analyses (16-21) along with one small case-control assessment (22). The results from these studies primarily suggest patterns of enrichment of rare functional variation within large pathways such as the regulation of intracellular signaling (18) and neuronal excitability (20). However, the consensus of these reports also points to the need for additional and larger sequencing efforts to confidently identify key genes and regions contributing to the risk of BD. BACKGROUND THE BASIS OF SUICIDAL BEHAVIOR SB represents the most severe psychiatric outcome and demonstrates elevated rates within many psychiatric disorders including depression, schizophrenia, and particularly within BD. In total, 80% of all individuals who have been diagnosed with BD type I or type II report suicidal ideation, 30-50% 4

22 will make at least one suicide attempt in their lifetime, and an estimated 18.9% will die from suicide (23, 24). SB is also strikingly common, being consistently represented within the top ten causes of death across all age groups within the United States (25) and is currently the second leading cause of death among young adults throughout the world (26). In addition, the rate of death by suicide has been steadily increasing within the United States each year since the late 1990 s and has shown a particularly accelerated rise within the last decade (27). Concern over such statistics has prompted greater efforts to understand and prevent SB, but the biological basis of SB remains unclear. This knowledge gap prevents the development of needed treatments and empirically-based risk assessments to address the rising trend of SB. Key evidence has been produced from family and twin studies that support a significant contribution of genetics to SB, with heritability estimates of 30-50% (28). It is expected that some of the liability for SB can be explained by comorbid psychiatric phenotypes. The remaining liability, however, is suspected to arise from independent factors unique to SB, such as impulsive-aggressive behavioral traits (29, 30). Efforts to explore the genetic basis of SB followed the pattern described for other complex psychiatric phenotypes, with the initial studies focused primarily on candidate genes suspected to be of importance to SB, followed by larger, whole-genome efforts. SB GWAS have identified several potentially important loci (31-35), but none of these sites have been consistently replicated across studies at this time. Though larger sample sizes will likely identify new risk loci within SB GWAS, it has been observed from estimates of existing GWAS data that the heritability explained by common variation, including variants with association P-values as high as 0.5, only reaches approximately 1.6% (33). This finding suggests that additional sources of variation are significantly contributing 5

23 to the genetic component of SB risk. At this time, however, no efforts have been made to explore the contribution of rare functional variation to SB risk, with the exception of a few candidate gene investigations (36, 37), leaving the potential role for CDRV hypothesis in SB unexplored. RAREBLISS EXOME PROJECT The RareBLISS exome-sequencing project was generated to complement the described genetic research efforts by providing a comprehensive exploration of rare functional variation within the severe psychiatric phenotypes of SB and BD in a large cohort of BD subjects. To accomplish this, we were fortunate to have access to several thousand samples collected as part of the National Institute of Mental Health Genetics Initiative and Chicago, Hopkins, National Institute of Mental Health Intermural Program (19, 20). These samples are composed of individuals diagnosed with BD that are accompanied by rich phenotypic information, including a thorough SB history, that was obtained during collection and interviewing. A large collection of these individuals was wholeexome sequenced and was assessed in two separate projects investigating the contribution of the CDRV to SB and BD. The BD portion of the RareBLISS exome project is described within chapter 2 in the published manuscript, Exome Sequencing of Familial Bipolar Disorder (38). This manuscript incorporates whole-exome sequences for both familial (8 pedigrees) and case/control (2,277 subjects: 1,135 BD cases, 1,142 normal controls) subjects to allow primary analyses within the family data to identify potentially penetrant rare variants of importance to BD followed by replication efforts within the RareBLISS case/control samples and two additional large replication samples from collaborators. 6

24 The SB portion of the RareBLISS exome project is described within chapter 3 via the published manuscript, Assessment of Whole-Exome Sequence Data in Attempted Suicide within a Bipolar Disorder Cohort (39). 1,018 BD individuals from the described sample who had complete SB histories were assessed in the SB exome project. These subjects were divided into two groups for comparison: those that had no past suicide attempts ( non-attempters ; N = 631) versus those that had at least one past suicide attempt with definite to severe intent to die ( attempters ; N = 387). The primary goal of the SB exome project was to investigate the potential role of the CDRV hypothesis specifically within SB, rather than to any comorbid psychiatric phenotypes. However, as has been noted, the psychiatric backgrounds of individuals that attempt suicide can be quite heterogeneous. This heterogeneity makes the isolation of signals specific to SB, rather than to comorbid psychiatric diseases, within a population representative of all SB difficult. Therefore, the RareBLISS dataset served ideally for this purpose due to a large, psychiatrically homogenous sample that allowed us to control for psychiatric background by comparing BD attempters and BD non-attempters. I was fortunate to enter the laboratory of Dr. Virginia Willour as the final year of sequencing for the RareBLISS dataset was taking place. Under her direction and that of many collaborators at the University of Iowa, Cold Spring Harbor Laboratories, and Johns Hopkins University, I was able to play an important role in the preparations of the final dataset as a member of the RareBLISS team. For both the SB and BD projects, I worked in the capacity of a bioinformaticist, generating many custom computational tools and pipelines utilizing other pre-built programs to address the preparation, quality control, storage, and analysis requirements of the projects. I served as the first author of 7

25 the SB manuscript and as a co-author for the BD manuscript. My specific responsibilities and the workflow for the exome project are outlined in Appendix A and described in detail within the introductory sections preceding the SB and BD project chapters in the dissertation. SUICIDAL BEHAVIOR TARGETED SEQUENCING PROJECT During our initial assessments of the RareBLISS data, we observed that a number of candidate genes with compelling prior evidence for involvement or association with SB had little coding variation and no evidence for association within the exome data evaluation. Recent evidence has indicated that, in addition to coding regions, untranslated regions and promoter elements near genes are particularly enriched for association findings within complex disease studies (40). Therefore, chapters 4-6 of this dissertation detail several efforts focused on much broader sequencing of several promising SB candidate genes, designed to capture many regulatory features near and within these genes that were not part of the RareBLISS data. The candidate genes were all assessed within a cohort of 949 individuals with BD that partially overlapped with the RareBLISS dataset (476 attempters, 473 non-attempters; 35% subject overlap with the RareBLISS dataset). Chapter 4 discusses an effort to completely sequence the candidate gene, SAT1, along with surrounding regulatory features near the gene locus. The complete project and our findings are described in the form of the published manuscript, Whole-gene sequencing investigation of SAT1 in attempted suicide (41). We selected the SAT1 gene due to extensive focus this gene has recently received within the literature. Variation within SAT1 has previously been associated with SB (42, 43), and this gene has been repeatedly identified as being differentially expressed within individuals who have either attempted or 8

26 died from suicide (42, 44-49). We sought to identify if rare variation within or near the gene might explain or support these past associations, particularly as no evidence for association was identified in the RareBLISS project. Our second targeted sequencing project focused on the assessment of 16 genes known to play important roles in the organization and function of glutamatergic synapses. Two agents known to reduce the risk of SB, lithium (50) and ketamine (51), both target the glutamatergic synapse as part of their activity, suggesting that this structure may be of particular importance to SB risk. Unlike SAT1, these genes were not sequenced completely due to the size of many of their loci. Instead, promoter regions and other sites with annotation suggesting regulatory function were targeted, along with all coding exons, leading to extensive additional sequencing data as compared with the RareBLISS project. Chapter 5 details our investigations of these genes through the presentation of the published manuscript, A targeted sequencing study of glutamatergic candidate genes in suicide attempters with bipolar disorder (52). Finally, chapter 6 details our targeted sequencing investigation of the FKBP5 gene. This project is presented through the published manuscript, Targeted Sequencing of FKBP5 in Suicide Attempters with Bipolar Disorder (53). FKBP5 serves as an important part of the hypothalamic-pituitary-adrenal (HPA) axis and variation within this gene has been associated with SB by several independent groups (54-58). As with the glutamatergic genes, only regions with evidence for regulatory function and exonic sites from all transcripts were included within the assessment of FKBP5 due to the large size of the gene. We began the process of designing these targeted gene sequencing efforts approximately one year after I joined the lab. I was able to play a lead role in generating a data processing pipeline to produce and assess the quality of the 9

27 complete dataset, using the RareBLISS dataset pipeline constructed by our collaborators at Johns Hopkins and Cold Spring Harbor Laboratories as a template (the complete workflow for this project is outlined in Appendix B). Under the direction of our chief analyst, Dr. Peter Zandi at Johns Hopkins, I also selected and performed all statistical analyses for these efforts. Finally, I assisted in drafting three independent manuscripts, serving as first author on our SAT1 manuscript, and as a co-author on the glutamatergic and FKBP5 projects. 10

28 FIGURES Figure 1.1: Heritabilities of Five Major Psychiatric Phenotypes. Values above each bar represent the estimated heritability from twin studies overviews of major depressive disorder (59), suicidal behavior (28), schizophrenia (60), bipolar disorder (8), and autism (61). 11

29 Figure 1.2: Significant Loci Identified per Cases Assessed in Psychiatric GWAS. This plot depicts the number of identified genome-wide significant loci in major GWAS studies for five psychiatric phenotypes. Each point represents a specific study where the y-axis represents the number of genome-wide associated loci identified per study (each locus can contain multiple SNPs that reached the threshold for genome-wide significance) and the x-axis represents the number of case subjects assessed (case subjects represent those positive for the phenotype being assessed). Black (suicidal behavior) and green (bipolar disorder) circles and arrows denote the phenotypes assessed within this dissertation. Results from assessed samples within several representative studies were used to generate the plots for schizophrenia (62-66), bipolar disorder (9, 10, 12, 13, 67), suicidal behavior (31-33), major depressive disorder (68-70), and autism (71-73). 12

30 Figure 1.3: Distribution of Variants in Various Disease Risk Hypotheses. Representation of the distribution of variants typically contributing the common disease, common variant (CDCV) hypothesis, typically detected by genome wide association studies (GWAS) the common disease, rare variant (CDRV) hypothesis, typically detected through sequencing efforts, and Mendelian disease. Adapted from Manolio et al. (3). 13

31 CHAPTER 2, COMPLETE PUBLISHED MANUSCRIPT, EXOME SEQUENCING OF FAMILIAL BIPOLAR DISORDER STUDY CONTRIBUTION This manuscript (38) reflects the primary assessment of 8 BD pedigrees as part of the RareBLISS exome sequencing dataset with replication efforts in three large case/control BD sequencing datasets. I was privileged to enter this project during the final stages of a multi-year sequencing effort, led by Dr. James Potash. This timing provided me with an opportunity to serve as a member of the RareBLISS team in the construction of the final dataset used in all manuscript analyses, as described here and within Appendix A. The creation of a large, high-quality whole-exome dataset requires many intricate steps to remove low-quality data and format the final dataset output files to be ready for all the analyses required for the final manuscript. My role in this project was focused on the collaborative selection, implementation, and troubleshooting of these final data preparation steps. To do this, I received the complete dataset in the form of raw and pre-processed variant call files (VCFs) and plink-formatted datasets prepared by Dr. Mehdi Pirooznia at Johns Hopkins University. I used these data to iteratively test the results of several quality thresholds applied to the data over the space of several months, including the removal of low-confidence genotype calls, variants demonstrating Hardy- Weinberg deviation, and variants with excessive missing genotype calls. During this process, I assessed each modified dataset via custom scripts written per criteria provided by Dr. Peter Zandi. These assessments ensured that low-quality data points were appropriately removed, no high-quality sites were inappropriately excluded, and that all remaining variant genotypes for each subject were consistent across each format, matched with the original raw-data, and matched with overlapping genome-wide association data previously 14

32 generated for all subjects. I provided weekly progress reports via conference calls with the RareBLISS team throughout this process, discussing the results of implemented methods, offering any recommendations based on these results, and participating in the decision-making process to determine which quality adjustments to include in the final dataset to be used for all analyses. The case/control replication analyses presented in this manuscript include 1,135 unrelated BD case and 1,142 unrelated normal control exome sequences that are also part of the RareBLISS dataset. An additional step necessary in the preparation of the case/control sequencing data is the confirmation of the assumption that all subjects are unique and unrelated within the sample. My role in validating this assumption was to perform an inheritance by descent (IBD) analysis to flag subjects with unexpectedly high relatedness to one another. To do this, I wrote a custom script, under the guidance of Dr. Peter Zandi, to identify ~7,000 common variants (minor allele frequency > 0.10) that were well-covered within all the case and control subjects. I then used the identified variants to create a separate plink-formatted dataset and assessed this data by the plink IBD utility, confirming subject independence and finalizing the dataset for analysis. COPYRIGHT STATEMENT Reproduced with permission from JAMA Psychiatry (6): pp , doi: /jamapsychiatry Copyright (2016) American Medical Association. All rights reserved. ABSTRACT Importance Complex disorders, such as bipolar disorder (BD), likely result from the influence of both common and rare susceptibility alleles. While common variation 15

33 has been widely studied, rare variant discovery has only recently become feasible with next-generation sequencing. Objective To utilize a combined family-based and case-control approach to exome sequencing in BD using multiplex families as an initial discovery strategy, followed by association testing in a large case-control meta-analysis. Design, Setting, and Participants We performed exome sequencing of 36 affected members with BD from 8 multiplex families and tested rare, segregating variants in 3 independent casecontrol samples consisting of 3541 BD cases and 4774 controls. Main Outcomes and Measures We used penalized logistic regression and 1-sided gene-burden analyses to test for association of rare, segregating damaging variants with BD. Permutation-based analyses were performed to test for overall enrichment with previously identified gene sets. Results We found 84 rare (frequency <1%), segregating variants that were bioinformatically predicted to be damaging. These variants were found in 82 genes that were enriched for gene sets previously identified in de novo studies of autism (19 observed vs expected, P =.0066) and schizophrenia (11 observed vs. 5.1 expected, P =.0062) and for targets of the fragile X mental retardation protein (FMRP) pathway (10 observed vs. 4.4 expected, P =.0076). The case-control meta-analyses yielded 19 genes that were nominally associated with BD based either on individual variants or a gene-burden approach. Although no gene was individually significant after correction for 16

34 multiple testing, this group of genes continued to show evidence for significant enrichment of de novo autism genes (6 observed vs 2.6 expected, P =.028). Conclusions and Relevance Our results are consistent with the presence of prominent locus and allelic heterogeneity in BD and suggest that very large samples will be required to definitively identify individual rare variants or genes conferring risk for this disorder. However, we also identify significant associations with gene sets composed of previously discovered de novo variants in autism and schizophrenia, as well as targets of the FRMP pathway, providing preliminary support for the overlap of potential autism and schizophrenia risk genes with rare, segregating variants in families with BD. INTRODUCTION Family, twin, and adoption studies have provided strong evidence for the importance of genetic factors in the etiology of bipolar disorder (BD). Yet, despite an estimated 0.7 to 0.8 heritability (74), identifying the specific genetic causes of BD has proved challenging. Numerous genome-wide linkage scans have been performed in BD, but the limited replication across studies suggests that variants of major effect are unlikely to exist. On the other hand, genome-wide association studies of BD have recently implicated a number of variants of small effect in very large samples (67). While an important role for rare single-nucleotide variants in complex diseases has been proposed on theoretical grounds (75, 76), empirical data have only recently begun to emerge. In autism, large-scale de novo studies (77, 78) have identified genes with recurrent, highly damaging mutations that cluster in pathways involved in transcriptional regulation, chromatin modification, and synaptic function. Initial studies in schizophrenia have been underpowered to 17

35 implicate rare variants in a specific gene, although they have found convergent evidence for association of both damaging de novo and singleton variants in gene sets of the postsynaptic density (PSD) proteins, calcium channels, targets of the fragile X mental retardation protein (FMRP), and chromatin remodeling genes (79-81). Several case-control and family-based sequencing studies of BD are ongoing, but only a limited number have been published (16, 17, 19, 82). Two family studies (19, 82) of the Amish population found evidence for partial segregation of a number of variants but little convergence on a specific gene or a linkage location. Similar results were found by Collins et al (16), who performed exome sequencing of a large pedigree without conclusively identifying variants of large effect, and by Cruceanu et al (17), who studied 25 pedigrees with lithiumresponsive BD and also found limited evidence for cosegregation at the level of variants or genes across families. The largest study (20) of BD to date performed whole-genome sequencing of 200 individuals from 41 families with BD, finding evidence for an excess of rare variants in pathways associated with γ- aminobutyric acid and calcium channel signaling. The results of these early studies suggest that the pattern of complex inheritance in BD seen with common variants may also hold for rare variants, with potentially many risk alleles of modest effect distributed across large numbers of genes and noncoding regions. Given the inherent challenges of studying rare variants with currently available sample sizes (83), we used a hybrid approach to exome sequencing in BD by sequencing 8 multiplex families as an initial discovery strategy, followed by a case-control meta-analysis of 3541 BD cases and 4774 controls. 18

36 KEY POINTS Question Can exome sequencing in families with bipolar disorder (BD) and casecontrol individuals reveal rare genetic variants associated with illness? Findings This case-control study found 84 rare, damaging variants that segregated with BD. These variants were located in genes enriched for ones previously identified in studies of autism and schizophrenia, as well as the fragile X mental retardation protein (FMRP) pathway. Case-control meta-analyses yielded 19 genes nominally associated with BD and continued enrichment of autism genes. Meaning This study provides preliminary support for the overlap of potential autism and schizophrenia risk genes, as well as targets of the FRMP pathway, with rare genetic variants found in families with BD. METHODS Primary Family and Case-Control Samples Cases for the family sample were ascertained at The Johns Hopkins University as part of a genetic linkage study of BD (84), which has been approved by The Johns Hopkins University Institutional Review Board. Written informed consent was obtained from all participants. The sequenced family sample was selected to include pedigrees with more than 4 affected members carrying the diagnosis of BD type I (BD-I), BD type II (BD-II), or schizoaffective disorder, bipolar type. One additional family with multiple cases of BD-II was also selected. The 8 pedigrees are shown in Figure 2.1. They include a mean of 6.9 affected individuals, and 36 of 55 were sequenced. 19

37 The primary case-control sample (1135 cases and 1142 controls) was generated from an exome sequencing study of BD and controls, referred to as the Rare Bipolar Loci Identification Through Synaptome Sequencing (RareBLISS) exome study (described more fully in the etext in the Supplement). DNA from both cases and controls was isolated from lymphoblastoid cell lines. All cases and controls were of self-reported European ancestry. Exome Sequencing We performed exome capture using capture arrays (NimbleGen EZ Exome, version 1 and version 2; Roche), followed by standard alignment and variant calling with the Genome Analysis Toolkit by McKenna et al (85) (a full description is given in the etext in the Supplement). Identified variants were annotated with ANNOVAR ( (86) using reference assembly (RefSeq, release 65; For annotation of potentially damaging variants, we followed the example of a recent schizophrenia exome sequencing study (79) in defining 3 successively more inclusive annotation categories based on 5 bioinformatics algorithms (SIFT, PolyPhen-2 HVAR, PolyPhen-2 HDIV, LRT, and MutationTaster) provided in the Database for Nonsynonymous SNPs and Their Functional Predictions ( (87). The 3 categories were characterized as nonsynonymous broad (evidence of damaging effect by any 1 of 5 different bioinformatics algorithms), nonsynonymous strict (evidence of damaging effect by all 5 different bioinformatics algorithms), and disruptive (canonical splice site, nonsense, or frameshift mutations). 20

38 Family Analysis For the family-based analysis, we used a filtering approach to identify lowfrequency (minor allele frequency [MAF] <1%) damaging variants (defined by the nonsynonymous broad annotation) that segregated with all affected relatives while allowing for one missing genotype per family. This strategy was motivated by a hypothesis that risk variants of moderate to high penetrance would be shared within families but not necessarily across different families given the failure of prior linkage investigations to identify replicable findings. Case-Control Analysis From the family analysis, we identified rare and damaging segregating variants that were tested in the RareBLISS case-control sample. For singlevariant tests, we used logistic regression with the Firth penalized likelihood method, which can incorporate discrete and continuous covariates and provides a more robust estimate of the effect size estimates when data are sparse (88). We included as covariates 4 ancestry-based principal components (derived from common variants extracted from the exome sequencing data set using the software Eigensoft [ (89)) and variables indexing differences in capture kits and sequencing platforms. For gene-level tests, we used PLINK/SEQ ( to test the overall burden of rare, damaging variants in cases vs controls. We carried out gene-level tests under 3 frequency categories (MAF <1%, MAF <0.1%, and singletons) and 3 annotation categories (nonsynonymous broad [NS broad ], NS strict, and disruptive), for a total of 9 tests. Meta-Analysis We further examined the variants and genes implicated by the family analysis in 2 additional case-control samples (etext in the Supplement). The first 21

39 comprised 1022 cases with BD and 2220 controls from Sweden (90). The second consisted of an interim data freeze from the Bipolar Research in Deep Genome and Epigenome Sequencing (BRIDGES) Study, comprising whole-genome sequencing results for 1388 cases with BD and 1412 controls, all of European ancestry (etext in the Supplement). Phenotypic details are provided in the etext in the Supplement. The analyses of both additional data sets were performed in a manner similar to what was done in Rare-BLISS using penalized logistic regression for single-variant analyses and PLINK/SEQ burden tests for the genelevel association tests. Because joint analysis has been shown to be more powerful than replication (91), we performed variant-level and gene-level tests of association, followed by fixed-effects meta-analysis of the results across all 3 samples using METAL ( (92). Gene Set Enrichment Analysis Genes with segregating variants were found to include a number of genes previously observed in de novo investigations of autism and schizophrenia, and some were also localized to the PSD. To test for specific enrichment of these gene sets, we used the curated list from prior studies (77, 80) that summarized genes with de novo nonsense and missense variants in autism (n = 1781), schizophrenia (n = 670), and intellectual disability (n = 141) studies, as well as genes encoding proteins found in the PSD (n = 1398) and the FMRP pathway (n = 795). We subsequently tested whether genes with a segregating variant were enriched for any of these 3 categories by randomly selecting an equal number of genes captured by our exome study while matching by the following 3 potentially confounding metrics: cumulative exon length (±20%), sequence coverage (±20%), and a gene-specific corrected measure of intolerance to missense variation (missense z score). The latter represents a standardized 22

40 measure of the deviation between observed and expected missense variants found in the Exome Aggregation Consortium database (93). We performed permutations and counted the number of times that randomly selected genes were found in each of the 3 gene sets. We then compared our observed counts of overlap with the 3 gene sets with this null distribution to obtain empirical P values. As an additional step to evaluate the potential role of background variation in our results, we also obtained a curated list of genes (n = 1215) found to harbor de novo variants (missense and loss of function) in well siblings from simplex families with autism (77). We tested for enrichment of these control de novo genes using the same matched permutation procedure. RESULTS Family Analysis We first searched for variants of moderate to high penetrance in 8 multiplex families by sequencing at least 4 affected members in each family (Figure 2.1). A total of 7551 variants were found to segregate as heterozygotes in at least 1 family, ranging from 511 to 1683 variants per family, reflecting the size of the pedigrees and the decreased likelihood of sharing variants with an increased number of sequenced cases (Table 2.1). Filtering for rare (MAF <1%), damaging variants as defined by the NS broad category led to the identification of 84 variants that segregated in at least 1 family, including 23 variants that further met the NS strict criteria and 5 variants that were disruptive. The 84 segregating variants were found in 82 independent genes (shown fully in etable 1 in the Supplement), with 2 genes (LAMA4 and OBSCN) having 2 segregating variants. The 2 segregating variants in LAMA4 were in the same family, while the 2 variants in OBSCN were found in independent families. However, both genes are 23

41 large and evolutionarily unconstrained (94), increasing the likelihood that they could be false-positives. Case-Control Follow-Up To obtain convergent evidence for association with BD, we examined whether segregating variants showed consistent evidence for association in 3 ongoing case-control studies of BD in individuals of European ancestry. These included the RareBLISS (1135 cases and 1142 controls), Sweden (1018 cases and 2220 controls), and BRIDGES (1388 cases and 1412 controls) studies. Each study was analyzed separately, and the results were combined in a fixed-effects meta-analysis. Variant-Level Analysis We performed penalized logistic regression of the 84 segregating variants in each of the 3 data sets, followed by a fixed-effects meta-analysis that included 3541 cases and 4774 controls (etable 2 in the Supplement). Of the 84 variants, 49 were present in at least 1 data set, 23 were present in all 3, and 35 were not found in any of them. Variants with a meta-analytic association P <.10 and an odds ratio greater than 1 are listed in Table 2.2, which includes 3 variants with nominally significant findings (P <.05) in the MLK4, APPL2, and HSP90AA1 genes. Gene-Burden Analysis Given the limited power to identify an association with single variants and the high likelihood of allelic heterogeneity in causal genes (95), we sought additional evidence for association using gene-based ( burden ) association tests of the 82 genes that included at least 1 rare, segregating variant from the family analysis. One-sided burden tests were performed in PLINK/SEQ under 3 frequency classes (MAF <1%, MAF <0.1%, and singletons) and 3 annotation 24

42 classes (NS broad, NS strict, and disruptive). Association analyses were performed independently in the 3 case-control samples, followed by a fixed-effects metaanalysis. The results of all of the tests performed for the 82 genes are summarized in etable 3 in the Supplement, with the top findings (P <.05) summarized in Table 2.3. No individual P value survived full correction for multiple testing (P < ), but 16 genes showed nominal evidence for association (P <.05). Five of these genes were previously found to have de novo damaging variants in investigations of autism (RPGRIP1L, FRAS1, AHNAK, KDM5B, and SLC12A4), while 3 of the genes (SLC4A1, APPL2, and AHNAK) encode proteins localized to the PSD, which has been recently implicated in rare variant studies of schizophrenia (77, 79, 80, 96). Gene Set Enrichment The above observations led us to ask whether segregating variants were located within 3 particular gene sets (autism, schizophrenia, and PSD) at a rate that exceeded chance expectation. When all 82 genes with segregating variants under the NS broad model were considered, there were 10 PSD genes, while there were 19 and 11 genes previously identified as harboring de novo missense or nonsense variants in investigations of autism and schizophrenia, respectively (etable 1 in the Supplement). We tested for overrepresentation of the 82 genes identified by the initial segregation analysis in these 3 gene sets and found evidence for enrichment in the autism set (19 observed vs 10.9 expected, P =.0066) and schizophrenia set (11 observed vs 5.1 expected, P =.0062). The results for the PSD set (10 observed vs 6.9 expected, P =.15) were consistent with chance expectation. In a further analysis, we tested whether the segregating genes with nominal evidence for association in the case-control meta-analyses, in either the 25

43 variant-level or gene-level tests (n = 19) (Table 2.1 and Table 2.2), also showed evidence for enrichment in the same 3 gene sets. This permutation analysis confirmed significant enrichment of the de novo autism gene set (6 observed vs 2.6 expected, P =.028) but not the PSD set (4 observed vs 1.7 expected, P =.09) or the de novo schizophrenia set (1 observed vs 1.0 expected, P =.65). Given the significance of the autism de novo gene set, we also tested for enrichment of the FMRP pathway and found significant evidence for enrichment in the 82 segregating genes (10 observed vs. 4.4 expected, P =.0076) but not in the subset of 19 genes with nominal evidence of association in the metaanalysis. We further examined the de novo intellectual disability gene set but found no evidence of enrichment (0 observed, P >.99). We note that the significant results for the 82 segregating variants survive correction for the 5 gene sets tested, while the smaller subset of genes with additional nominal significance in the meta-analysis does not. To evaluate the potential confounding role of baseline or expected rates of variation, we tested whether genes implicated by de novo variants found in control siblings were enriched among our segregating variants. The results based on the 82 genes with segregating variants were consistent with chance expectation (10 observed vs 7.6 expected, P =.21). DISCUSSION Our study represents one of the first large-scale exome sequencing efforts in BD and one of the first to combine a family-based and case-control design. In each of our selected multiplex families, we found several rare, segregating variants of predicted damaging effect, leading us to seek supportive evidence for association in 3 large-scale, case-control exome sequencing studies. Variantlevel and gene-burden analysis provided supportive nominal evidence (P <.05) 26

44 for 19 of these genes, although neither variant-based nor gene-burden results met study-wide thresholds for statistical significance. However, we found support for enrichment of segregating variants in genes identified by de novo studies of autism and schizophrenia, with additional evidence for the autism gene set enrichment from case-control data. The 19 genes implicated by segregating variants that showed the strongest evidence for case-control association with BD (Tables 2.2 and 2.3) included more members of the de novo autism gene set than expected. This enrichment was based on the following 6 genes: HSP90AA1, RPGRIP1L, FRAS1, AHNAK, KDM5B, and SLC12A4. The most strongly implicated of these genes is KDM5B, in which 2 nonsense and 2 missense de novo mutations have been found in sporadic cases with autism (77). KDM5B (also known as JARIDB1) encodes a histone H3 lysine 4 (H3K4) demethylase that has been linked to neural differentiation in embryonic stem cells (97). Intriguingly, the recent Psychiatric GWAS Consortium pathway-based analysis of common genetic variation identified histone H3K4 methylation as the most strongly associated pathway with BD (98). Moreover, histone H3K4 methylation was also found to be the most strongly associated pathway in a cross-disorder analysis of BD, schizophrenia, and major depressive disorder, raising the possibility that it may increase susceptibility to a broad number of mental disorders. While genetic overlap between BD and schizophrenia has been well documented by family studies (99) and by genome-wide association studies (100), there are also emerging data to suggest the presence of etiological overlap between BD and autism. In particular, analyses of the Swedish national registers have yielded evidence for an increased risk of autism in individuals with BD (relative risk [RR], 13.2) and in their first-degree relatives (RR, ), with a 27

45 coheritability estimate of 65% (101). Similarly, Swedish registry data have also shown the inverse, with an increased risk of BD in individuals with autism (RR, 6.6) and their siblings (RR, 1.8) (102). Overlap between BD and autism is also seen in investigations of rare copy number variation (CNV), with a recent metaanalysis of CNV studies showing evidence for association with BD of 3 CNVs (1q21.1 dup, 3q29 del, and 16p11.2 dup) originally implicated in both autism and schizophrenia (103, 104). Our study should be seen in light of a number of important limitations. First, a major challenge of rare variant studies is the increasing recognition that very large sample sizes may be necessary to perform a fully powered casecontrol study (83). Although our hybrid family-based study, followed by casecontrol association, was designed to improve power by limiting the genomic search space to variants and genes identified by an initial segregation analysis, power analyses of the combined case-control sample continued to show that the meta-analysis was underpowered to detect the types of effect sizes and allele frequencies found in our study (efigure in the Supplement). Second, our study exclusively focused on exome variation and thus could not detect any role of rare noncoding variants. Third, although our coverage of the exome is typical of most studies of this type, it is incomplete at approximately 80%, which is an inevitable limitation of current exome capture and sequence technology. Fourth, while we have used widely accepted bioinformatics tools to classify variants as damaging, these tools are probabilistic and imprecise and will likely miss the effect of variants that are tissue specific. Fifth, in focusing only on fully segregating variants, we have not considered less penetrant variants that may be involved in disease susceptibility because these variants would be even more difficult to detect with our available sample size. In sensitivity analyses, we performed a 28

46 broad analysis of variants shared among 2 or more affected family members and did not find evidence for any association meeting correction for multiple testing or for any similar enrichment compared with the analysis presented in this study. Sixth, our study has relied solely on traditional clinical phenotypes and has not characterized individuals in ways that may align more closely with disease pathophysiology. It is likely that a new generation of genotype-first studies will be needed to delineate which specific phenotypes will constitute molecular subtypes of BD (105). CONCLUSIONS In summary, although our study remains underpowered to implicate rare variants in individual genes, we have found preliminary evidence for the overlap of potential autism and schizophrenia risk genes with our segregating variants. These results provide further data on shared genetic susceptibility across the major psychiatric disorders CONFLICT OF INTEREST DISCLOSURES Dr McCombie reported participating in meetings sponsored by Illumina and Pacific Biosciences (which had no decision-making roles related to this study) over the past 4 years, reported receiving travel reimbursement and honoraria for presentations, and reported being a founder and shareholder of Orion Genomics, which focuses on plant genomics and cancer genetics. No other disclosures were reported. FUNDING SUPPORT This work was supported by grants R00MH86049 (Dr Goes), K01MH (Dr Pirooznia), R01MH (Dr McCombie), MH and MH (Dr Boehnke), and R01MH (Dr Potash) from the National Institute of Mental Health and by a National Alliance for Research in 29

47 Schizophrenia and Affective Disorders (NARSAD) Young Investigator Award (Dr Stahl). ROLE OF THE FUNDER/SPONSOR The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. ADDITIONAL CONTRIBUTIONS Aravinda Chakravarti, PhD, provided helpful analytical contributions. We thank the Bipolar Research in Deep Genome and Epigenome Sequencing (BRIDGES) study and affiliated study members (Goncalo Abecasis, DPhil, Gerome Breen, PhD, William G. Iacono, PhD, Matt McGue, PhD, Melvin G. McInnis, MD, Richard M. Myers, PhD, Carlos N. Pato, MD, PhD, and John B. Vincent, PhD) for prepublication data based on the sequencing of samples from multiple studies, including the STEP-DB sample from the National Institute of Mental Health repository. For the Swedish bipolar disorder exome sequencing study, Steve McCarroll, PhD, helped generate the sequencing data, and we acknowledge the use of the Swedish National Quality Register for Bipolar Disorder (BipoläR) and thank the clinical collaborators, data collectors, and facilitators in the St Göran Project (Stockholm, Sweden) and in the Department of Medical Epidemiology and Biostatistics at Karolinska Institutet, Sweden, for their help with recruitment of participants. Finally, we particularly thank the individuals and families who volunteered and thus made this work possible. 30

48 FIGURES Figure 2.1: Pedigrees of the 8 Families With Bipolar Disorder (BD) Selected for Sequencing. Dashed lines indicate the individuals who were sequenced. A square represents a male individual; a circle, a female individual; shading, an affected individual; and a slash mark, a deceased individual. 31

49 TABLES Table 2.1: Overview of Segregating Variants per Family by Allele Frequency and Functional Annotation Strata No. of Individuals Sequenced With BD-I or BD-II No. of Exonic Variants Nonsense MAF1 NS broad NS strict Disruptive Pedigree Total Abbreviations: BD, bipolar disorder; MAF1, minor allele frequency less than 1%; NS, nonsynonymous. 32

50 Table 2.2: Variant-Level Meta-analysis Showing All Variants with a Meta-analytic Association P <.10 Meta-analysis ORs (95% Cis) and P Values (Cases and Controls) Variant Gene OR P Value Direction a RareBLISS Sweden BRIDGES chr1: _ga MLK OR = 2.29 (0.40- OR = 2.81 (0.68- OR = 3.06 ( ); P = ); P = ); P =.10 (4 of 1131 vs 2 of (4 of 1014 vs 3 (7 of 1388 vs ) of 2217) of 1412) chr12: _ga APPL ?+ OR = 6.13 ( ); P =.189 (2 of 1133 vs 0 of 1142) chr14: _ac HSP90AA ?+ OR = 2.48 ( ); P =.236 (5 of 1130 vs 2 of 1140) chr13: _ga POSTN OR = 1.60 ( ); P =.499 (5 of 1130 vs 3 of 1139) chr10: _tg EIF3A OR = 2.79 ( ); P =.031 (15 of 1120 vs 5 of 1137) Abbreviations: BRIDGES, Bipolar Research in Deep Genome and Epigenome Sequencing; NA, not applicable; OR, odds ratio; RareBLISS, Rare Bipolar Loci Identification Through Synaptome Sequencing. NA OR = 7.14 ( ); P =.108 (3 of 1388 vs 0 of 1412) NA OR = 3.06 ( ); P =.10 (7 of 1388 vs 2 of 1412) OR = 5.10 ( ); P =.077 (3 of 1015 vs 1 of 2219) OR = 0.79 ( ); P =.543 (8 of 1010 vs 23 of 2197) OR = 1.60 ( ); P =.486 (5 of 1388 vs 3 of 1412) OR = 1.63 ( ); P =.156 (21 of 1388 vs 13 of 1412) Post- Synaptic Density De Novo Schizophrenia No No No Yes No No De Novo Autism Yes Yes Yes No No No No No No a Direction refers to the odds ratios found in the meta-analyzed studies being in the same (+) or opposing (-) direction relative to the risk allele found in the family sample. 33

51 Table 2.3: Gene-Based Meta-analysis Showing All Genes with a Meta-analytic Association P <.05 a Meta-analysis ORs (95% Cis) and P Values (Cases and Controls) Gene Annotation Frequency OR P Value Direction a RareBLISS Sweden BRIDGES VWA8 NS broad MAF OR = 1.74 (0.62- OR = 1.68 (0.95- OR = 1.60 ( ); P =.216 (9 of 2.91); P =.103 (22 of 2.66); P =.043 (37 of 1126 vs 5 of 1137) 996 vs 29 of 2191) 1351 vs 24 of 1388) RPGRIP1L NS broad Singleton OR = 0.85 (0.26- OR = 3.21 (1.19- OR = 4.70 ( ); P =.702 (5 of 9.19); P =.03 (9 of 24.27); P =.008 ( vs 6 of 1136) 1009 vs 6 of 2214) of 1377 vs 2 of 1410) SLC4A1 NS broad MAF OR = 3.92 ( ); P =.007 (13 of 1122 vs 3 of 1139) DAG1 NS broad Singleton OR = 0.53 ( ); P =.905 (4 of 1131 vs 8 of 1134) APPL2 NS strict MAF OR = 7.06 ( ); P =.104 (3 of 1132 vs 0 of 1142) FRAS1 NS strict MAF OR = 1.01 ( ); P =.419 (7 of 1128 vs 7 of 1135) AHNAK NS broad MAF OR = 1.12 ( ); P =.378 (39 of 1096 vs 35 of 1107) KDM5B NS broad Singleton OR = 0.70 ( ); P =.857 (4 of 1131 vs 6 of 1136) CCDC109B NS broad Singleton OR = 2.35 ( ); P =.38 (3 of 1132 vs 1 of 1141) SLC12A4 NS broad MAF OR = 1.39 ( ); P =.25 (14 of 1121 vs 10 of 1132) DHX38 NS strict MAF OR = 6.42 ( ); P =.007 (9 of 1126 vs 1 of 1141) OR = 1.08 ( ); P =.482 (18 of 1000 vs 37 of 2183) OR = ( ); P =.001 (8 of 1010 vs 1 of 2219) OR = 1.60 ( ); P =.224 (13 of 1005 vs 18 of 2202) OR = 2.33 ( ); P =.02 (17 of 1001 vs 16 of 2204) OR = 1.61 ( ); P =.029 (55 of 963 vs 76 of 2144) OR = 6.58 ( ); P =.008 (7 of 1011 vs 2 of 2218) OR = ( ); P =.113 (2 of 1016 vs 0 of 2220) OR = 1.96 ( ); P =.088 (16 of 1002 vs 18 of 2202) OR = 3.06 ( ); P =.204 (3 of 1015 vs 2 of 2218) OR = 1.40 ( ); P =.181 (28 of 1360 vs 21 of 1391) OR = 1.80 ( ); P =.336 (4 of 1384 vs 2 of 1410) OR = 1.70 ( ); P =.142 (12 of 1376 vs 7 of 1405) OR = 1.30 ( ); P =.383 (8 of 1380 vs 6 of 1406) OR = 1.10 ( ); P =.336 (60 of 1328 vs 57 of 1355) OR = 3.10 ( ); P =.186 (4 of 1384 vs 1 of 1411) OR = 3.10 ( ); P =.178 (4 of 1384 vs 1 of 1411) OR = 1.20 ( ); P =.348 (18 of 1370 vs 15 of 1397) OR = 0.10 ( ); P =.10 (0 of 1388 vs 3 of 1409) Post- Synaptic Density No No Yes No Yes No Yes No No No No De Novo Autism No Yes No No No Yes Yes Yes No Yes No 34

52 Table 2.3 continued POSTN NS broad MAF OR = 0.69 ( ); P =.833 (6 of 1129 vs 9 of 1133) LRP5 NS broad Singleton OR = 0.45 ( ); P =.986 (7 of 1128 vs 16 of 1126) HYOU1 NS broad MAF OR = 0.39 ( ); P =.98 (4 of 1131 vs 11 of 1131) RBM4B NS broad Singleton OR = 1.41 ( ); P =.477 (3 of 1132 vs 2 of 1140) FAM129A NS broad Singleton OR = 1.01 ( ); P =.648 (3 of 1132 vs 3 of 1139) Abbreviations: BRIDGES, Bipolar Research in Deep Genome and Epigenome Sequencing; MAF1, minor allele frequency less than 1%; MAF01, minor allele frequency less than 0.1%; NS, nonsynonymous; OR, odds ratio; RareBLISS, Rare Bipolar Loci Identification Through Synaptome Sequencing. a Gene-burden tests were performed in PLINK/SEQ, with ORs obtained from OR = 2.66 ( ); P =.04 (11 of 1007 vs 9 of 2211) OR = 2.03 ( ); P =.091 (12 of 1006 vs 13 of 2207) OR = 1.87 ( ); P =.037 (32 of 986 vs 38 of 2182) OR = 6.56 ( ); P =.049 (4 of 1014 vs 1 of 2219) OR = 6.56 ( ); P =.053 (4 of 1014 vs 1 of 2219) OR = 3.10 ( ); P =.087 (7 of 1381 vs 2 of 1410) OR = 2.10 ( ); P =.097 (11 of 1377 vs 5 of 1407) OR = 1.30 ( ); P =.222 (32 of 1356 vs 26 of 1386) OR = 1.30 ( ); P =.494 (4 of 1383 vs 3 of 1409) OR = 1.60 ( ); P =.354 (5 of 1383 vs 3 of 1409) penalized logistic regression. Meta-analysis was performed in METAL using the PLINK/SEQ burden P values. None of the genes showed a meta-analytic association (P <.05) with de novo schizophrenia. b Direction refers to the odds ratios found in the meta-analyzed studies being greater than 1.0 (+) or less than 1.0 (-). No No No No No No No No No No 35

53 SUPPLEMENTAL INFORMATION Supplement figures and tables referenced in this article (38) can be accessed on the publisher s website, 36

54 CHAPTER 3, COMPLETE PUBLISHED MANUSCRIPT, ASSESSMENT OF WHOLE-EXOME SEQUENCE DATA IN ATTEMPTED SUICIDE WITHIN A BIPOLAR DISORDER COHORT STUDY CONTRIBUTION This manuscript details the examination of 387 BD subjects with a history of suicide attempt and 631 BD subjects with no past suicide attempts, taken from the RareBLISS whole-exome sequencing dataset. I joined this project as the final stages of sequencing were being completed and was able to play an important role in the dataset preparation (Appendix A) as part of the RareBLISS team and particularly played a central role in the preparation and maintenance of our local dataset. I also performed all statistical analyses, under the direction of Dr. Virginia Willour and Dr. Peter Zandi, and wrote the final manuscript (39). My responsibilities are detailed below. The large volume of data produced from whole-exome sequencing necessitates thoughtful processing and storage to allow efficient data access and analyses. Therefore, my first responsibility in this project was to assist in assuring the quality of the final dataset and preparing a platform to store and analyze the data from. To do this, I received variant call file (VCF) datasets from our collaborators at Johns Hopkins. With guidance from our chief analyst at Johns Hopkins, Dr. Peter Zandi, I assisted in the quality control and production of the final dataset (detailed in the introduction of chapter 2 and Appendix A). I also annotated the complete dataset via utilizing our computational cluster, and constructed a SQL-based database to store all genotype, subject, annotation, and result information for our analyses. A variety of different database designs were generated and tested for reliability and efficiency over the space of approximately 1.5 years, concurrent with dataset finalization. 37

55 Following the construction of a high-quality and accessible dataset, it was necessary to prepare a comprehensive but thoughtful data analysis plan. I was tasked, with guidance from Dr. Virginia Willour and Dr. Peter Zandi, with identifying and implementing all statistical analyses to be included in the manuscript. To aid in this process, I examined existing literature and also the template for the RareBLISS BD case/control analyses. I selected minor allele frequency thresholds and functional categories to be applied within coding and regulatory sequence data. These choices were made with careful consideration to the size and content of our dataset as well as the characteristics of SB. Over the space of approximately two years, I prepared, revised as necessary, and implemented custom scripts to carry out all individual variant, gene, and pathway-based analyses as well as to format figures and tables to present the final outputs. With the assistance of Dr. Peter Zandi and Dr. Mehdi Pirooznia from Johns Hopkins University, I carefully checked all of our results for errors in result or assumptions regarding the utilized statistical models prior to submission for publication. The end result is a comprehensive and high-quality analysis of the only exome-wide analysis of rare-functional variation within SB. ABSTRACT Suicidal behavior is a complex and devastating phenotype with a heritable component that has not been fully explained by existing common genetic variant analyses. This study represents the first large-scale DNA sequencing project designed to assess the role of rare functional genetic variation in suicidal behavior risk. To accomplish this, whole-exome sequencing data for ~19,000 genes were generated for 387 bipolar disorder subjects with a history of suicide attempt and 631 bipolar disorder subjects with no prior suicide attempts. Rare functional variants were assessed in all exome genes as well as pathways 38

56 hypothesized to contribute to suicidal behavior risk. No result survived conservative Bonferroni correction, though many suggestive findings have arisen that merit additional attention. In addition, nominal support for past associations in genes, such as BDNF, and pathways, such as the hypothalamic-pituitaryadrenal axis, was also observed. Finally, a novel pathway was identified that is driven by aldehyde dehydrogenase genes. Ultimately, this investigation explores variation left largely untouched by existing efforts in suicidal behavior, providing a wealth of novel information to add to future investigations, such as metaanalyses. INTRODUCTION Despite continuing work to improve the diagnosis and care for patients suffering from severe psychiatric disorders with substantially increased risk for suicidal behavior, such as bipolar disorder (BP) (106), rates of attempted and completed suicide have not fallen. Death by suicide now accounts for over 800,000 deaths each year around the world and is the second leading cause of death for individuals aged (26). In addition, current worldwide estimates suggest that suicide accounts for a total of 50% of all violent deaths in men and 71% of violent deaths in women, with an estimated 20 attempts for every death by suicide, imposing a terrible and ongoing societal cost (26). Evidence from twin-based studies has demonstrated that suicidal behavior has a genetic component, with an estimated heritability of 30-50% (28). Two sources are suspected to drive this heritability: psychiatric disorders such as mood and alcohol/substance use disorders and independent heritable factors such as impulsive-aggression (29, 30). Discovering the underlying genetic basis of these factors could offer critical insight into biological pathways and 39

57 mechanisms that contribute to suicidal behavior risk and provide new targets for patient assessment and treatment. Toward this end, numerous candidate gene and pathway-driven studies have been undertaken that have identified many genetic associations with suicidal behavior (online suppl. Table S1; for all online suppl. material, see The diversity and number of these findings underscore the complexity of suicidal behavior genetic risk and make follow-up efforts in the framework of individual candidate gene studies inefficient. This has led to the use of broad hypothesis-free investigation methods in many recent studies, including gene expression (42, ), linkage ( ), and genome-wide association studies (GWAS) of common variants (31-35, ). These studies have identified additional targets that may contribute to the risk of suicidal behavior, but very few implicated sites have been replicated (120, 121). Further, few implicated candidate genes have been deeply assessed for the possibility that rare functional variation within them might contribute to the phenotype (36, 37, 41). The primary aim of our study was to broadly examine rare functional variation throughout the human exome in order to identify individual variants, genes, and/or pathways with significant variation differences between individuals who have attempted suicide and those who have not. To do this, we took advantage of a large BP whole-exome sequencing project that has generated data on the coding exons of approximately 19,000 genes in 1,018 BP subjects with an available suicide attempt history. The resulting attempted-suicide wholeexome sequencing project is the first such effort in the field of suicide genetics. 40

58 MATERIALS AND METHODS Sample Collection Our BP sample consists of 1,018 age matched (P = 0.45) unrelated individuals of European-American ancestry. These subjects overlapped with our prior GWAS study (91% overlap) (32). Briefly, subjects were diagnosed as BP type 1 (942 subjects), schizoaffective disorder bipolar type (75 subjects), or BP not otherwise specified (1 subject) in accordance with research diagnostic criteria, diagnostic statistical manual (DSM)-III-R, or DSM-IV criteria. All individuals were interviewed using either the Diagnostic Interview for Genetic Studies (DIGS) (122) or the Schedule for Affective Disorders and Schizophrenia (SADS) (123), which both include self-reported suicide attempt histories. Subjects were also asked about previous suicide attempts and their intent to die. Subjects were included in the study that either had no self-reported suicide attempts (nonattempters; 631 subjects) or had at least 1 self-reported suicide attempt with definite or serious intent to die (attempters; 387 subjects). All included subjects, following a complete explanation of the parent study, supplied institutional review board-approved written consent. Additional sample set details, including demographic details (online suppl. Table S2), can be found in the online supplementary materials and methods. Sequencing and Data Preparation Sequencing was performed according to the NimbleGen SeqCap EZ Exome protocol (NimbleGen, Madison, WI, USA). Target capture used NimbleGen SeqCap EZ arrays v1, v2, or v2 plus added targets for the promoter and untranslated regions of 1,422 neuronal postsynaptic density genes (124) and 57 genes suspected to be of importance in BP that were obtained from a number of individual candidate gene studies, including those cited here (54, ) 41

59 (online suppl. Table S3). The additional targets in the v2+ array represent potential regulatory sites that have been shown to be particularly enriched for associations with complex disease in past studies (40). The inclusion of these regulatory sites within genes of suspected importance in psychiatric disease allows us to economically increase the potential yield of the data set. Paired-end sequencing was performed on an Illumina GA-IIx or HiSeq 2000 (Illumina, San Diego, CA, USA). Individual sample data were processed via a Burrows Wheeler Aligner (128), SAMtools (129), BAMtools (130), Picard ( and the Genome Analysis Tool Kit (GATK) (131) pipeline. GATK UnifiedGenotyper processing of Reducereads (85) files allowed final genotyping of all subjects at once. Variants were predicted to be functional based on annotation data and were classified into 2 levels of evidence for both coding and noncoding ( regulatory ) sites. Coding variants were annotated by ANNOVAR (86). Stopgain and essential splice site variants were classified as coding disruptive. All nonsynonymous variants predicted to be damaging by at least 1 of 6 bioinformatic packages were combined with all coding disruptive variants under the classification coding broad. Both coding classifications were modeled after a recent schizophrenia exome study (79). For consistency, regulatory variants were annotated by RegulomeDB (132) with scores of 1-2 being classified as regulatory narrow and 1-6 as regulatory broad. The final variant set included only calls/sites that passed GATK variant recalibration, Hardy-Weinberg (P > 1*10-6 ), depth 10, and genotyping quality 20. Detected insertion/deletions (indels) and tri/quad allelic sites were removed from the data set due to the technological limitations of accurately calling these complex alleles. Subjects were assessed via principal component analyses 42

60 (PCA) and a sex-check algorithm to remove mislabeled or outlier individuals. PCA components showed no clustering based on phenotype or platform/assay (online suppl. Fig. S1). In addition, subjects were required to have depth 20X in 70% of the targeted sites. Finally, singleton variant distribution was assessed in attempters and nonattempters, demonstrating no systematic genotyping bias between groups (P = 0.29). Additional details regarding the sample prep, platforms used, assay versions, genotyping pipeline, and quality control measures can be found within the online supplementary materials and methods. Statistical Analyses All single-variant tests and collapsed-variant tests ( gene burden ; using the combined multivariate collapsing method (133)) were performed using the R- package logistf (134). Additional gene tests were performed via the Sequence Kernel Association Test (SKAT) (135) for autosomal genes only. Pathway analyses were divided into 2 categories. In a primary pathway analysis, 33 pathways hypothesized to be potentially important within suicidal behavior risk were assessed for genetic association via PLINK/SEQ v0.10 ( SMP tests (online suppl. Table S4), which allows for correction of potential systematic variant calling biases between comparison groups. Additionally, a secondary pathway analysis was performed within a more comprehensive set of 3,621 pathways derived primarily from the molecular signature database (MSigDB) (136) which includes Gene Ontology (GO) (137), Kyoto Encyclopedia of Genes and Genomes (KEGG) (138), and many other annotated functional pathways. Gene and pathway association tests examined coding disruptive/broad and regulatory narrow/broad sets independently. 43

61 All single-variant and gene-level tests were corrected for sex, sequencing platform/array, and the first 5 PCA components for each subject. PLINK/SEQ pathway SMP tests were corrected for sequencing platform/array version. The top pathway association results were also separately controlled with alcohol dependence as a covariate due to an enrichment of genes that are known to be involved in alcohol metabolism in several of the top pathways. All gene, pathway, and the overall functional variant enrichment tests were performed using rare variants within 2 minor allele frequency (MAF) thresholds: 0.05 MAF (MAF05) and 0.01 MAF (MAF01). Variant thresholds were defined by the allele frequency within our complete dataset in addition to European 1,000 genomes (139), Non-Finnish European subjects from the Exome Aggregation Consortium (ExAC 0.3; and the National Heart Lung and Blood Institute (NHLBI) GO Exome Sequencing Project ( data, if available. To control for multiple testing, all test results were compared against conservative Bonferroni thresholds: P < 5*10-8 (~1,000,000 tests) for single variant, conservative P < 1.0*10-6 (49,452 tests) and a more liberal P < 2.5*10-6 (~20,000 genes assessed) for gene-level, and P < 1.4*10-5 (3,654 tests) for pathway association tests. In addition, PLINK/SEQ SMP pathway association analyses were further assessed via permutation by swapping attempter/nonattempter labels 500 times and rerunning the analyses to develop an empirical corrected P value for each pathway result. Levels of suggestive significance for all analyses were defined as any P value within approximately 1 magnitude of the established threshold for significance for that test, as defined above. Additional statistical analysis details may be found within the online supplementary materials and methods. 44

62 RESULTS This study utilized high-quality exome data for 1,018 BP subjects over a targeted region of approximately 26, 36, or 54 Mb depending on the array version used. A mean depth of coverage of 72.6X across captured targeted sites was achieved within this dataset. In addition, an average of 93.9% of the target reads per subject reached at least 10X depth, the quality threshold we imposed for inclusion in our analyses (see online suppl. Table S5 for additional sample set metric details). A total of 494,475 variants were detected with 59.9% (295,959 variants) being found within coding sites and 40.1% (198,516 variants) residing in regulatory regions. Additionally, 20% of all detected variants (99,303 variants) represent novel variation not found within existing databases. Finally, the expected Ti/Tv values for exome arrays are and for wholegenome sequencing (131). Our data represent expected values for exome sequencing (Ti/Tv range of for the SeqCap EZ v1 and v2) and exome plus some selected noncoding targets in the SeqCap EZ v2+ array (Ti/Tv = 2.83; online suppl. Table S5). Individual Variant Test Results No individual variant in our data was associated with attempted suicide at study-wide significance (see online suppl. Table S6 and Figure 3.1). The top individual variant result was identified as rs , a common nonsynonymous variant within the amphiphysin (AMPH) gene with odds ratio (OR) = 0.61 for the minor allele and nominal P = 2.8*10-5. This result fell well short of the Bonferroni threshold of 5.0*10-8 required for a significant result. A QQ-plot was then generated to assess the overall pattern of our individual variant tests at varying MAFs. This plot demonstrated our results followed the expected null distribution (online suppl. Figure S2). As a result, we focused on assessing sets of variants in 45

63 gene and pathway analyses. We examined variants that were bioinformatically predicted to be coding disruptive (N = 6,316), regulatory narrow (N = 4,830), coding broad (N = 129,735), or regulatory broad (N = 44,846), separately comparing each set of variants between suicide attempters and nonattempters. Gene Level Tests No association test with any individual gene achieved study-wide significance following correction for multiple testing. Many genes, however, achieved nominal significance (P < 0.05) within the data. A total of 1,613 genes reached nominal significance with 909 genes identified in the burden analyses and 1,011 genes identified in the SKAT analyses (overlap of 307 genes reaching nominal significance in both test types). Online supplementary Table S7 outlines those genes that achieved a nominal P value <0.01, referred to as our top genes. Two genes reached a level of suggestive significance: CFAP70 and SLC6A13 (Table 3.1). Functionally, CFAP70 (140) is suspected to play a role in cilia function which, in turn, is essential for many key developmental and cellular maintenance processes. SLC6A13 is known to encode a γ-aminobutyric acid (GABA) reuptake transporter that plays a role in regulating GABA neurotransmission (141). We also closely examined all gene-based test signals within genes previously associated with suicidal behavior (see online suppl. Table S1). This collection of genes was assembled, in part, using the list published by Perlis et al. (31) in 2010 along with any more recent genes identified via a PubMed search with the keywords suicide, genetic, and association. Two previously identified genes, BDNF and DISC1, yielded nominal evidence for association (P 46

64 < 0.05) of OR = 1.8 with P = 4.5*10-3 for BDNF and OR = 0.72 with P = 3.5*10-2 for DISC1 with both signals arising in rare regulatory broad variants. Pathway Analyses The lack of any single gene that could survive correction for multiple testing encouraged us to examine groups of genes that operate together within biological pathways. Pathway tests afford greater detection power across biologically related genes that are not significantly associated individually, and have been successful at detecting broad patterns of rare disruptive variation in other complex diseases such as schizophrenia (79). We performed a primary examination of 33 biological pathways of potential relevance to suicidal behavior risk (online suppl. Table S4). The majority of these primary pathways were selected based on general neuronal development, maintenance, and function. In addition, several pathways were selected that have been specifically hypothesized to be involved with suicidal behavior risk, including the hypothalamic-pituitary-adrenal (HPA) axis, glutamatergic, and serotonergic pathways. None of the primary pathway association tests generated results that survived correction for multiple testing (P < for 33 primary pathways), with the best result being identified within extremely rare disruptive variants within the HPA axis pathway (142) with an OR of 6.5 and a P value of As no primary pathway survived correction for multiple testing, we chose to broaden our pathway investigation to include a total of 3,621 additional pathways obtained from the MSigDB (136). Specifically, we selected welldescribed biological process pathways such as those found within KEGG (138) and GO (137) data sets, as well as pathways of genes that are regulated by specific transcription factors or microinhibitory RNAs. These tests generated 1 promising result, the KEGG Limonene and Pinene Degradation pathway (OR = 47

65 1.7, P = 1.6*10-5, permutation corrected P = 0.10). Top contributing variants and genes for this pathway are presented in online supplementary Tables S8 and S9, respectively. In addition, several suggestive (P < 0.001) results were identified (Table 3.2) with 3 of these results, including our top result, being driven particularly by a set of 5 overlapping aldehyde dehydrogenase genes (KEGG Limonene and Pinene Degradation, KEGG Histidine Metabolism, and KEGG Beta Alanine Metabolism; P = 1.6*10-5 to 4.7*10-4 ; Figure 3.2). As the aldehyde dehydrogenase genes serve several important metabolic roles, including functioning as key players in the metabolism of ingested alcohol, we included subject history of alcohol dependence as a covariate in the analyses. Correcting for alcohol dependence produced essentially no change in the magnitude of the signals for these pathways (P = 1.0*10-5 to 6.9*10-4 ). DISCUSSION This study represents a systematic, large-scale next-generation sequencing effort to examine the phenotype of suicidal behavior. The goal of this study was to provide an unbiased view of the contribution of exome-wide rare functional variation to suicidal behavior risk. This goal was accomplished through the use of a broad range of analysis tools on a variety of variant subsets in BP subjects with and without a history of suicide attempt. Pathway Exploration Recent efforts to explore rare functional variation within severe psychiatric disorders, such as schizophrenia (79, 80), have had success in identifying associations with disease risk. These findings incorporated a large number of rare variants that were too weak in association signal to be identified individually or even at the level of individual genes, but together implicated common biological pathways. 48

66 We chose to utilize a similar approach in order to maximize our analytical power. We performed a hypothesis-driven investigation of 33 pathways, including a number of general neuronal pathways and a selection of pathways particularly suspected to play a role within suicidal behavior risk (online suppl. Table S4). In addition, we performed pathway assessments on an additional 3,621 pathways primarily selected from the MSigDB (136). Though this analysis did not identify results that could survive conservative Bonferroni correction for multiple testing, several potentially interesting pathways arose with 1 reaching near-significance, the KEGG (138) Limonene and Pinene Degradation pathway. This pathway result, along with 2 other top pathway results, was driven by an enrichment of rare functional coding variants in suicide attempters within aldehyde dehydrogenase (ALDH) genes (Figure 3.2). This is of potential interest to the field due to the well-established contribution of alcohol use to suicidal behavior risk (143). Indeed, within the sample of attempters and nonattempters investigated in this study, alcohol dependence is significantly more common within attempters (OR = 2.1; P = 4.9*10-8 ). This prompted us to include the history of alcohol dependence for each subject as a covariate within a secondary assessment of the top pathways. Including the alcohol dependence covariate resulted in essentially no change in the magnitude of the top pathways (Limonene and Pinene Degradation Pathway OR = 1.7, P = 1.0*10-5 ) suggesting that alcohol dependence per se is not the primary driver of this signal trend. It must be noted that none of the 33 primary pathways, including several hypothesized to play a role within suicidal behavior risk, assessed within this study showed strong evidence for association within our analyses. These primary pathways included the heavily researched serotonergic gene pathway. The serotonergic pathway has long been studied in suicidal behavior due to the 49

67 central role this system plays in the treatment of psychiatric disease coupled with early evidence associating differential serotonin metabolite levels with suicidal behavior (144). A large number of subsequent genetic studies have been carried out that have identified associations including SLC6A4, MAOA, TPH, TPH2, HTR1A, and HTR2A of the serotonergic system with suicidal behavior (see online suppl. Table S1). There are several possibilities for why all of these previously associated serotonergic genes are not strongly enriched for functional variation within our exome data. The candidate gene studies that identified these associations were primarily focused on common variants that were either synonymous coding changes or fell within noncoding regions of the genome. These attributes are important for 2 reasons. First, the associated noncoding sites were not covered within our exome-sequencing targets. Second, synonymous variants would not be considered within our gene and pathway analyses due to our focus on sites with evidence for functional effects. Ultimately, additional expanded sequencing efforts to increase sample sizes as well as to encompass regulatory regions may be required to identify causal variants behind previously identified signals. The Contribution of High-Throughput Investigation Efforts in Suicide Existing high-throughput analyses in suicidal behavior, including GWAS, expression studies, and DNA methylation have identified a broad array of potentially important genes and regions, many of which are not among the set of candidate genes that has traditionally been the focus of suicidal behavior research efforts (online suppl. Table S1). For example, GWAS techniques have had success identifying nominally associated sites throughout the genome as well as 2 potential genome-wide significant associations (P ~ 5.0*10-8 ) with common variants in at least 1 population each within the 2p25 region (32) and 50

68 the ABI3BP gene (31). Genome-wide gene expression investigation efforts have similarly implicated pathways of potential importance such as polyamine regulation (42), general nervous system development (110), and glutamatergic and GABAergic synaptic signaling (111). Genome-wide DNA methylation investigations in suicidal behavior have also identified general associations with many genes involved with cognitive processes (145) as well as the recent implication of SKA2 as a potential suicide candidate gene (146). In like manner, our study sought to address the knowledge gap regarding the contribution of rare, functional variation throughout the human genome to suicidal behavior risk. A few of the suggestive results from this study show general overlap with some of the previously identified pathways, such as in the case of 1 of our top genes, SLC6A13, implicating GABAergic systems, and also the appearance of central nervous system development within our top pathway findings. Despite these examples, however, the results from this effort demonstrate minimal overlap with the genes and regions implicated by the above-referenced high-throughput efforts. This may be a reflection of fundamental differences in study design between the presented work and these other genome-wide investigations. Specifically, the effects of rare functional sites are unlikely to have been robustly detected within the GWAS framework due to a focus on common variation sites. In addition, the top regions identified within suicidal behavior GWAS are found within regulatory regions not targeted within the design of our sequencing study. Similarly, methylation and expression analyses are focused on exploring the contribution of perturbed regulation of genes rather than coding variation directly damaging to gene function as was the focus of our analyses. Therefore, the variation examined within our whole-exome sequencing effort is 51

69 largely unexamined by previous efforts, yielding a novel and valuable data resource that serves as a complement to existing efforts. In total, 20% (99,303 sites) of the high-quality variants identified within our data set were previously undescribed in existing sequencing databases, including the 1000 Genomes (139), DBSNP version 142 (147), exome sequencing project ( or the ExAC ( The majority of the novel sites (70%, 69,214 variants) are bioinformatically predicted to have potential functional affects and can be added to existing variant databases to be further assessed within future complex disease studies and meta-analyses of suicidal behavior. These resources, when combined with growing sample sizes and existing data, may assist in the creation of convergent lines of evidence that could lead to a more comprehensive and empirically supported understanding of the basis for suicidal behavior risk than is appreciated at this time. Study Limitations Our study has several limitations. First, our sample had limited power to detect variants and gene signals of small to moderate effect size. We estimated 80% power to detect independent rare variants with a relative risk of 2.9 (MAF 0.05) to 7.2 (MAF 0.01) at study-wide significance (see online suppl. Figure S3). We also estimated 80% power to detect a study-wide significant gene-burden signal with a relative risk of depending on the collapsed functional variant frequency within the gene locus by subjects, noting that power levels will be reduced for burden testing if not all collapsed variants for a given gene contribute to risk (see online suppl. Figure S4). Second, the pathways we chose to examine represent a subset of all available pathways for analysis. Additionally, the examined pathways are not all- 52

70 inclusive representations of the genes that may be important to the given processes, and may be missing key genes that would help explain that pathway s role in suicidal behavior. For example, one of the primary pathways we assessed, the KEGG (138) glutamatergic synapse pathway, did not include several key synaptic structural proteins such as NRXN or LRRTM family genes which have been implicated in suicidal behavior (32). Furthermore, analyses geared toward any single pathway may fail to capture a signal that crosses several potentially associated pathways. Third, while whole-exome sequencing offers an economical way to deeply examine the variation most likely to directly impact gene structure and function, this method suffers from certain disadvantages as well. Exome sequencing arrays are designed to target well-described transcripts within the genome. Therefore, exome sequencing data do not comprehensively cover all coding transcripts of the genome, potentially missing important regions. In addition, existing genome-wide studies of complex disease have frequently identified associated loci that are not within the coding regions of the genome, but are enriched for regions near coding genes (40). While a fraction of the genes examined in this study did have limited coverage of regulatory regions as part of our customized array, our design largely did not examine the vast nonexomic regions of the genome. Further assessment of the whole genome or within regulatory regions near candidate genes may identify further key loci for suicidal behavior risk. Fourth, our study design did not consider all available sources of captured variation within our gene and pathway-based analyses. For example, there is evidence that synonymous variation, which we did not include in these analyses, can contribute to disease risk (148). In addition, sites of common variation (MAF 53

71 >0.05) were excluded from these analyses. It remains to be seen how these sources of variation, combined with rare functional variation, contribute to suicidal behavior risk. Fifth, we did not include consideration of environmental factors, such as individual abuse history, in our association tests as we did not have this information for all of the subjects. Such factors are known to increase the risk for suicidal behavior and may act synergistically with genetic variation to enhance risk. Sixth, our study population was composed entirely of subjects with a history of BP. While such homogeneity is useful to limit signals associated with phenotypes other than suicidal behavior, it also limits the generalizability of our findings to those with a BP background. Additional cohorts with differing psychiatric backgrounds will need to be assessed in order to identify the generalizability of these findings to all cases of suicidal behavior. In addition, living subjects were sequenced for this study. It remains to be seen whether these findings can be replicated in those who have died from suicide. Conclusion This study represents the first effort to examine rare functional variation across the exome through sequencing in suicidal behavior. By using comprehensive analyses and investigating variation largely unexamined by existing studies, we have provided a view of the potential contribution of rare functional variation within suicidal behavior. Our results offer suggestive evidence of many genes and pathways that may prove to be informative for suicidal behavior risk. This study provides a wealth of novel information that can be considered within future investigations. Larger sample sizes and studies focused 54

72 on whole-gene/-genome sequencing will further illuminate which signals are particularly important in suicide risk. ACKNOWLEDGEMENTS This work was supported by NIH grant R01MH (Dr. Willour), using data generated by NIH grants R01MH (Dr. Potash) and R01MH (Dr. McCombie). Additional funding was provided by the University of Iowa Medical Scientist Training Program (MSTP) training grant 5 T32 GM (E.T. Monson), American Foundation for Suicide Prevention (AFSP) PDF (Dr. Breen), and NIH grant K01MH (Dr. Pirooznia). We also wish to acknowledge the support of the University of Iowa Interdisciplinary Graduate Program in Genetics (E.T. Monson, S.C. Gaynor, Dr. Willour, and Dr. Potash). We gratefully acknowledge the preparation and distribution of samples used in this study by RUCDR Infinite Biologics. We also wish to express our gratitude to the many individuals who participated in the diagnosis and interviewing of subjects included in this study. Finally, we would like to thank the many families who offered their support and time to the study. STATEMENT OF ETHICS All subjects included in this study supplied institutional review boardapproved written informed consent. DISCLOSURE STATEMENT W.R. McCombie is a founding member of the plant genomics and cancer genetics company, Orion Genomics, and retains shares within this company. W.R. McCombie has also been provided costs associated with travel as well as honoraria for presenting at Illumina and Pacific Biosciences sponsored meetings over the past several years. None of these companies (Orion Genomics, Illumina, nor Pacific Biosciences) have played any part in the decision making, 55

73 direct support, data generation, analysis, or any other part of this study. Mr. Monson, Dr. Pirooznia, Dr. Parla, Ms. Kramer, Dr. Goes, Dr. Breen, Ms. Gaynor, Ms. de Klerk, Dr. Jancic, Dr. Karchin, Dr. Zandi, Dr. Potash, and Dr. Willour reported no biomedical financial interests or potential conflicts of interest. 56

74 FIGURES Figure 3.1: Manhattan Plot of Results for Association Tests with All Variants in the Dataset. Attempters (387 subjects) vs. nonattempters (631 subjects). The horizontal bar represents the threshold for genome-wide significance based on the analysis of ~1,000,000 variants (P = 5.0*10-8 ). 57

75 Figure 3.2: Representation of the Shared Genes in 3 of the Top Pathway Association Results that Demonstrated Strong Overlap of Included Genes. Overlapping genes between the 3 pathways are labeled. 58

76 TABLES Table 3.1: Gene Signals for Coding and Noncoding Analyses Gene ID Position Variant Set Gene P Value OR Attempter Collapsed Frequency Nonattempter Collapsed Frequency SLC6A13 12p13.33 Coding Broad, SKAT, MAF01 6.1* CFAP70 10q22.2 Coding Broad, SKAT, MAF01 6.5* Collapsed frequencies refer to the frequency of individuals with at least 1 variant meeting the variant set criteria across the complete gene locus. Gene P values and ORs corrected for sex, platform, and the first 5 principal components for each subject. P<1.0*10-6 (conservative) or P<2.5*10-6 (liberal) required for significance via Bonferroni correction. 59

77 Table 3.2: Pathway Analysis Results Pathway Variant Set Genes, n OR KEGG Limonene and Pinene Degradation (138) MSigDB Micro Inhibitory RNA 494 Targets (136) KEGG Beta Alanine Metabolism (138) KEGG Histidine Metabolism (138) GO Keratinocyte Differentiation (137) GO Central Nervous System Development (137) MSigDB CEBPA_01 TF Targets (136) Coding Broad MAF01 Coding Broad MAF01 Coding Broad MAF01 Coding Broad MAF05 Coding Broad MAF01 Coding Broad MAF01 Coding Broad MAF05 SMP P Value Corrected P Value * * * * * * * KEGG, Kyoto Encyclopedia of Genes and Genomes; GO, Gene Ontology; MSigDB, Molecular Signature Database; TF, Transcription Factor; SMP P value corrected for assay platform; P < 1.4*10-5 required for significance via Bonferroni correction. Corrected P value, empirical corrected P value identified from 500 attempter/nonattempter swapping permutation analyses. 60

78 SUPPLEMENTAL INFORMATION Supplementary figures and tables referenced in this article (39) can be accessed on the publisher s website, or directly at - Exome_Sequence_Data_in_Attempted_Suicide_within_a_Bipolar_Disorder_Coh ort/

79 CHAPTER 4, COMPLETE PUBLISHED MANUSCRIPT, WHOLE- GENE SEQUENCING INVESTIGATION OF SAT1 IN ATTEMPTED SUICIDE STUDY CONTRIBUTION This manuscript (41) describes the investigation of the SB candidate gene, spermidine acetyltransferase 1 (SAT1) via complete sequencing of the gene locus and flanking areas with evidence of regulatory importance. I contributed to our research team, under the direction of Dr. Virginia Willour and Dr. Peter Zandi, by preparing computational methods for identifying subjects and targets to be sequenced, managing project data (see Appendix B), and performing all statistical analyses. I also served as the first author by leading the drafting of the primary manuscript. The goal behind the design of this project was to comprehensively sequence the SAT1 gene in an evenly distributed sample of BD subjects with and without past suicide attempts. My first role was to identify the subjects to be sequenced based on criteria that were established by the group. To do this, I wrote an algorithm to examine all samples available for sequencing and select samples meeting the criteria. Selected samples were then automatically ordered to make plates matched for attempters/non-attempters and male/female subjects. After the samples were selected, my next role, with guidance from Dr. Thomas Casavant and his group, was to generate sequencing targets for SAT1 plus regulatory regions to be sequenced outside of the gene locus. To do this, I created and applied custom scripts to merge the transcript data from several gene databases stored in the UCSC Genome browser. These merged transcripts were then used to generate sequencing targets that covered the largest possible transcript of SAT1. I additionally selected regulatory targets from ENCODE UCSC data in ranges defined based on evidence for such features to interact 62

80 with adjacent genes (cis regulatory features) from a literature search. All the generated targets were then submitted to allow generation of baits for the sequencing of the gene. Following sequencing performed by my colleague, Sophia Gaynor, the next goal was to align all the sequenced data to the reference human genome and identify variants within our subjects. It was my role to establish a local automated computational pipeline to align, genotype, and remove low quality data from all our targeted sequencing project data. To do this, I received substantial assistance from Melissa Kramer at Cold Spring Harbor Laboratories (CSHL), who provided me access to the original data processing pipeline and raw data for the RareBLISS project housed at CSHL. I used these data to establish and test the original pipeline for expected results locally. I then extensively modified the pipeline in accord with updated best practices and software versions, testing several iterations and quality control settings to identify the most reliable and high quality dataset as compared with known GWAS genotypes. Our final goal was to apply a comprehensive analysis strategy to assess past associations of SAT1 with suicidal behavior and identify any novel associations within the sequencing results. My role was to identify past analyses from the literature that identified associations of SAT1 with SB to incorporate, determine any new analyses to include under the guidance of Dr. Virginia Willour and Dr. Peter Zandi, and implement all the selected statistical analyses. I replicated previously published analyses that were identified in the literature search as closely as possible within our own data. I also prepared gene-based assessments of rare functional variation in similar fashion to the exome study (chapter 3, Appendix A and B) with the addition of a linkage disequilibrium (LD) 63

81 region assessment designed to capture and assess variation in an identified LD region particularly implicated in past association studies. ABSTRACT Suicidal behavior imposes a tremendous cost, with current US estimates reporting approximately 1.3 million suicide attempts and more than 40,000 suicide deaths each year. Several recent research efforts have identified an association between suicidal behavior and the expression level of the spermidine/spermine N1-acetyltransferase 1 (SAT1) gene. To date, several SAT1 genetic variants have been inconsistently associated with altered gene expression and/or directly with suicidal behavior. To clarify the role SAT1 genetic variation plays in suicidal behavior risk, we present a whole-gene sequencing effort of SAT1 in 476 bipolar disorder subjects with a history of suicide attempt and 473 subjects with bipolar disorder but no suicide attempts. Agilent SureSelect target enrichment was used to sequence all exons, introns, promoter regions, and putative regulatory regions identified from the ENCODE project within 10 kb of SAT1. Individual variant, haplotype, and collapsing variant tests were performed. Our results identified no variant or assessed region of SAT1 that showed a significant association with attempted suicide, nor did any assessment show evidence for replication of previously reported associations. Overall, no evidence for SAT1 sequence variation contributing to the risk for attempted suicide could be identified. It is possible that past associations of SAT1 expression with suicidal behavior arise from variation not captured in this study, or that causal variants in the region are too rare to be detected within our sample. Larger sample sizes and broader sequencing efforts will likely be required to identify the source of SAT1 expression level associations with suicidal behavior. 64

82 INTRODUCTION Suicidal behavior, which encompasses both suicide attempts and completed suicides, is responsible for approximately 650,000 emergency room visits each year within the United States (149) and is a leading cause of death worldwide, claiming in excess of 800,000 lives each year as reported by the World Health Organization (26). Accumulating evidence exists of a genetic component contributing to the risk of suicidal behavior, which has an estimated heritability of 30 50% (28). Several genetic studies have been undertaken to explain this heritability, and numerous potential risk loci have been identified. However, few findings have been consistently replicated. Among the most well-established findings is the spermidine/spermine N1-acetyltransferase (SAT1) gene, which has been observed to have altered expression within the brains of suicide completers as compared to controls within several cohorts (42, 44-49). Common genetic variation has been explored within the SAT1 gene and has been associated with suicidal behavior (42, 43) and altered expression of the gene (150), although these findings have been inconsistent (44). The numerous sources of evidence in support of a role for SAT1 in suicidal behavior risk places this gene in a unique position among suicide candidate genes, warranting further detailed examination to understand the basis of these existing associations. SAT1 locus studies performed thus far have generally focused on only a handful of common genetic variants and/or have been performed within very small cohorts, leaving the contribution of SAT1 genetic variation to suicidal behavior risk unclear. Functionally, SAT1 serves as a key enzyme within the polyamine processing pathway, which includes the regulation of polyamine compounds such as spermidine, spermine, and putrescine. Polyamines play many essential 65

83 roles within gene regulation, cellular maintenance, cell function, cell fate determination, and oxidative stress response (151). Initial interest in SAT1 within psychiatric disease arose due to observations that polyamine levels demonstrate transient shifts within the brain in response to stressful stimuli (152). One potential relationship to suicidal behavior lies in the observation that intracellular polyamine concentrations significantly alter the function of neuronal ion channels, including excitatory glutamatergic receptors (151). In addition, recent evidence has demonstrated that treatment with lithium significantly alters SAT1 gene transcription levels in subjects at risk for suicidal behavior as well as normal controls, but not in subjects who died from suicide (153). Therefore, SAT1 has some biological plausibility in suicidal behavior risk. In this study, we present the first complete gene sequencing effort of the SAT1 gene locus in a sample of individuals with bipolar disorder (BP). This population contains 476 BP individuals with a past history of suicide attempt ( attempters ) and 473 BP individuals with no past suicide attempts ( nonattempters ). We employed next generation targeted sequencing to capture all coding and many regulatory regions of the SAT1 locus. This design was implemented to provide a detailed examination of genetic variation within or near SAT1, including rare and potentially functional variants that likely would not have been captured as part of the majority of the existing study designs. As a result, this study provides the largest and most comprehensive examination of genetic variation of SAT1 in attempted suicide performed to date. MATERIALS AND METHODS Study Subjects This study is composed of 949 unrelated individuals of European American descent who had been previously diagnosed with BP. This sample was 66

84 taken from the same population utilized and described in our past genome wide association study (GWAS) (32). Briefly, these subjects were ascertained as part of the National Institutes of Mental Health (NIMH) Genetics Initiative Bipolar Disorder Collaborative study as described elsewhere (122, 154, 155). All included subjects were diagnosed following RDC, DSMIII- R, or DSM-IV criteria with 948 subjects meeting criteria for bipolar disorder, type 1 (BP1) and one subject meeting criteria for schizoaffective disorder, bipolar-type (SA-BP). All subjects were interviewed using the Diagnostic interview for Genetic Studies (DIGS) versions 1 4 (122), which include questions about past suicide attempts and intent to die. Attempters were defined as any individual with at least one selfreported suicide attempt with moderate to serious intent to die (476 subjects). Non-attempters were defined as any individual with no self-reported past suicide attempt (473 subjects). Within these groups, there were 252 female and 224 male attempters and 251 female and 222 male non-attempters. Individuals were only included for analysis after being provided a description of the original study followed by completion of an institutional review board-approved written informed consent document. Sequencing A custom SureSelect target enrichment (Agilent Technologies, Santa Clara, CA) assay was designed via the SureDesign Custom Design tool using custom scripts to select the gene region from University of California Santa Cruz (UCSC) Table Browser data tracks (156) for Ensembl, UCSC, GENCODE genes v17, and RefSeq gene tracks. The assay was designed to capture the entire SAT1 locus of all alternative transcripts as represented in any of the datasets plus 2 kb upstream of all transcriptional start sites to capture transcriptional promoter elements. In addition, any region with overlapping signals for DNase 67

85 hypersensitivity and ChIP-Seq identified transcription factor binding sites from the ENCODE project version 2 (157) within 10 kb up- or downstream of the largest merged SAT1 transcript was targeted for sequencing. Sample preparation for sequencing was performed according to established Agilent protocols and utilizing a Sciclone Caliper robotic system (PerkinElmer, Waltham, MA) at the University of Iowa Genomics Division. Briefly, shearing of 3mg of high quality genomic DNA was performed in a Covaris E220 ultrasonicator (Covaris, Woburn,MA) with quality assessment of the sheared fragments in an Agilent Bioanalyzer DNA fragments were end-repaired, polyadenylated, and ligated to identifying adapters in order to generate sequencing libraries prior to hybridization. Multiplex sequencing of 16 samples per lane was performed using a HiSeq 2000 (Illumina, San Diego, CA) following standard Illumina protocols. Data Preparation A customized pipeline was generated to process samples and quality control the data. Briefly, samples were aligned to the human reference genome version hg19/grch37 using the Burrows-Wheeler aligner (BWA) version (128). Further processing was accomplished via SAMtools version (129), BAMtools version (130), and Picard version 1.88 ( Base score recalibration and realignment of sequence around suspected insertion/deletion (indel) sites was performed by the Genome Alignment Tool Kit (GATK) version (85). Finally, genotypes of single nucleotide polymorphisms (SNPs) and indels were generated via the haplotypecaller gvcf method of the GATK in all 949 samples simultaneously. Genotyping was followed by rounds of variant recalibration in the GATK using recommended best practices. Variant calls with depth 10, genotyping 68

86 quality score 20, heterozygote X-chromosome calls in males, and Y- chromosome calls in females were removed from the dataset. Variants were removed from the dataset if they violated Hardy Weinberg equilibrium (P<1x10-6 ) via PLINK version 1.07 (158), if they had missing calls for >10% of the subjects, or if they failed the variant recalibration assessment. Individual subject samples were subjected to rigorous quality control as part of the original GWAS study from which all of the samples were derived (32). Subject data for the present study were assessed by comparing genotypes against existing GWAS genotypes, identifying a call match rate of 99.8%. Subjects were also checked for expected sex using the plink version 1.07 check-sex algorithm with no deviations identified. All variants identified in the SAT1 gene were annotated for position and function using Annovar (version March 22, 2015) (86) and RegulomeDB (version 1.1) (132). Rare variants predicted to have functional effects were classified for analyses separately within translated ( coding ) and untranslated/non-coding regions ( regulatory ) of the SAT1 gene. Variants in coding and regulatory regions were placed in categories to define the level of evidence for the variant having a functional effect. Coding disruptive and coding broad classifications were applied to the coding functional variants following the example of recent exome analyses in schizophrenia (79, 80). Coding disruptive variants are composed of all stopgain, essential splice site, and frameshift variants. Coding broad variants include all disruptive variants plus any variant predicted to be damaging by any of six different bioinformatic packages included in ANNOVAR (86) annotation: SIFT (159), Polyphen2 HDIV and HVAR (160), LRT (161), MutationTaster (162), and VEST version 3 (163). 69

87 Regulatory variants were similarly classified into two functional categories using RegulomeDB annotations (132). As suggested in the literature, regulatory functional classifications were defined based on scores of 2 being likely to affect binding (defined as regulatory narrow ). Scores of 3 6 represent variants that are less likely to affect binding and were grouped with all regulatory narrow variants to make a broad regulatory variant group (scores 1 6; regulatory broad ) (132). Statistical Analysis We examined all detected variants within the SAT1 regions independently using Firth s penalized logistic regression via the R logistf version 1.21 package (134). All tests were performed using the count of minor alleles as the basis of the association test and were corrected for the covariates of sex and the first three principal component analysis (PCA) components for each subject. Additional tests were used to collectively assess rare variants with two minor allele frequency (MAF) thresholds: MAF 0.05 (MAF05) and MAF 0.01 (MAF01). These rare variants were also required to have a bioinformatically predicted functional effect, as described above. Identified rare functional variants were assessed as a single group across a given genetic region ( gene burden ). Gene burden tests were performed via collapsing rare functional variants by subject using the CMC collapsing method (133), leading to each subject having rare functional variation present or absent for each assessed gene/ region. The gene burden tests were assessed using the Firth s penalized logistic regression package version 1.21 for R (134), suitable for small sample sizes typical in comparisons of rare variants. All gene burden tests included the covariates of sex and the first three components from each subject as generated from a PCA based on GWAS data available for all subjects. 70

88 Secondary analyses were also performed to assess additional features of interest within the SAT1 gene locus and to assess subpopulations of the dataset. A focused gene burden test was performed on a 4.3 kb promoter/upstream LDblock identified using Haploview v4.2 (164) via the confidence interval analysis method initially published by Gabriel et al. (165). This 4.3 kb block closely corresponds to the region of SAT1 where the majority of the past suicidal behavior and SAT1 expression associations with genetic variation have been identified (42, 43, 150). A haplotype analysis of all variants in the same 4.3 kb block was also performed by Haploview with default settings, including a 10,000 permutation test within Haploview to provide corrected P-values of the haplotype analyses. Briefly, all haplotypes with a frequency of >0.01 were considered, the data were specified to be on the X-chromosome, and specified as a case/control sample. Finally, we performed male- and female-specific assessments within all the described analyses to identify any sex-specific results. All assessments were corrected for multiple testing using the conservative Bonferroni method. Using this approach, the single variant threshold is 7.5x10-4 (67 total variants identified) and the region-based threshold is 6.25x10-3 (eight total primary tests). For single variant tests, we also employed a second method of correction for multiple testing due to the strong LD structure of SAT1 by correcting for the number of LD blocks in the region. This resulted in a significance threshold of (three blocks). Power Analysis Study power was calculated via Quanto version This analysis was performed using the parameters of matched case-control, an alpha of 7.5x10-4, estimated population risk of suicidal behavior of 4.6%, log-additive inheritance for 71

89 allele frequencies of , and estimating sample power for a relative risk of RESULTS Single Variant Testing An overview of all 67 detected unique variant sites within the SAT1 locus and their frequencies within our dataset are shown within Figure 4.1 along with all association results in supplementary Table S1. None of the identified signals survived correction for multiple testing. Single variant association tests identified one variant with a nominal P- value<0.05 within the SAT1 locus. The top signal identified was rs :g>a (odds ratio = 11, nominal P = 0.031). This variant is very rare (MAF = within our dataset), was only identified within suicide attempters, and is within the 3 untranslated region (UTR) of the SAT1 gene. Sex-specific tests were also performed within individual variants because a number of the existing associations with SAT1 reported in the literature arise from male-only cohorts (42, 43, 150). A sex-specific assessment of all individual variants did not identify any variants associated at nominal significance (P<0.05). The strongest variant signal identified within the sex-specific assessment was rs , the same variant identified as the top overall signal. rs was predominantly found in females with a history of suicide attempt (four female attempters; odds ratio = 8.9, nominal P = 0.058), was seen in only a single male attempter, and not seen in non-attempters. The placement of rs was assessed in relation to the linkage disequilibrium (LD) structure of the entire sequenced area via haploview v4.2 (164) to ascertain any relationship it might have to previously identified variants associated with suicidal behavior in the SAT1 locus. This variant fell outside of an 72

90 upstream LD block of approximately 4.3 kb that encompassed the previously associated rs :c>a SNP and rs indel (Figure 4.2). Previously Associated Variants Variants within the SAT1 locus that were previously associated with suicidal behavior and/or SAT1 expression were also examined within our dataset for replication (Table 4.1). rs (42, 150) showed no evidence for association with attempted suicide within our dataset, neither in the all-subject analysis (OR = 0.99, P = 0.92) nor the sex-specific (male OR = 1.3, P = 0.29; female OR = 0.89, P = 0.44) analyses. For rs , it is also important to note that our male-specific finding with OR > 1 implicates the A allele (minor allele) as the risk allele, which is opposite in direction to the previously reported findings that were assessed in an all-male sample. The second variant that has been reported to be associated with suicidal behavior is the indel, rs (43). This site did not pass the required quality thresholds we imposed on our dataset due to limitations in next-generation sequencing alignment techniques, which preclude confidently resolving highly repetitive or simple sequence regions such as those around rs However, two variants previously identified to be in strong LD with rs (rs928931:t>c and rs928932:g>a) were successfully captured with high quality reads. These two variants reside within 100 bp rs indel, and neither variant showed any evidence for association with attempted suicide in any population group assessed in our study (supplementary Table S1). Region-Based And Haplotype Assessments Primary analyses of the coding and regulatory regions of SAT1 encompassing all rare (MAF 0.05 and MAF 0.01) variants predicted to have functional consequences yielded no results that could survive correction for 73

91 multiple testing (Table 4.2). Likewise, no signals were detected within all sexspecific analyses or within promoter-only analyses focused on the 4.3 kb upstream LD block of the SAT1 locus where previous associations have been made with suicidal behavior (Table 4.2). A haplotype analysis was also performed of this 4.3 kb upstream haplotype block using Haploview v4.2 (164). This analysis identified five common haplotypes (>1% frequency) for the 4.3 kb region within our dataset that accounted for the configurations in 98.4% of our subjects. None of the haplotypes showed any trend toward association within attempted suicide in any of the subject groups assessed (all attempters vs. non-attempters and male- /female-specific assessments; supplementary Table S2). DISCUSSION To our knowledge, this study represents the largest and most comprehensive sequencing effort performed on the SAT1 gene locus to date. The study was designed to aid in answering the question of whether common and rare genetic variations within the SAT1 locus are significantly contributing to suicidal behavior risk. The lack of any significant signals after correction for multiple testing within a broad range of analyses performed, regardless of subject or variant set assessed, offers compelling evidence that genetic variation of moderate to high effect within the SAT1 gene is not a significant contributor to attempted suicide risk. There are a number of possible explanations for why we have not identified an association of variation within the SAT1 locus that bear consideration in light of these results. Study Population and Size Previous studies that have identified significant associations of SAT1 variants with suicidal behavior have been performed in notably different samples 74

92 compared with the BP case-only attempter/non-attempter population examined in the current study. The closest match to our design was an examination of suicide attempters from varied psychiatric backgrounds compared with normal controls (166), which found suggestive associations within the SAT1 locus after relaxing the false discovery rate threshold used for their primary analyses. The remaining genetic associations were identified within a French Canadian cohort with a known founder effect and with a primary psychiatric background of major depressive disorder (42, 43, 167). These sample differences necessitate caution in drawing conclusions from direct comparisons of results. Study size must also be considered carefully. Our study was well powered to detect associations of the magnitude (OR = ) and within the MAF range ( ) for the previously reported SAT1 variants (42, 43). We estimated 80% power to detect a signal of OR = 2.0 at an MAF of within our dataset. Despite this power, it is possible that population differences between studies could lead to weaker signals within the associated regions. In addition, it is possible that very rare functional variants within the SAT1 locus may be present but would not be detectable without a much larger sample size. Linkage Disequilibrium of the Region Prior studies note a strong LD structure in the vicinity of the promoter and transcriptional start site of the SAT1 gene, where observed suicidal behavior association signals were concentrated (150). We also detected this LD region covering an area of approximately 4.3 kb that encompassed the first exon of SAT1 and the upstream promoter region of the gene (Figure 4.2). The capture of this region allowed a detailed examination of sites previously associated with suicidal behavior and provided a reference for the strongest signals detected within our own analyses. 75

93 Unfortunately, technological limitations in the alignment of repetitive or low complexity regions such as those inhabited by the previously associated indel, rs , prevented us from confidently examining this variant directly. However, an association of rs with suicidal behavior within the 4.3 kb LD region would be reflected in the association signals of variants in high LD in the vicinity, as was observed by Fiori et al. (150). Several nearby variants were very well represented within our study within this LD block, but none showed any evidence of association with attempted suicide, reducing the likelihood that an association is present at rs within our data. In addition, the top single variant signal detected in this study, rs , was identified outside of the previously implicated 4.3 kb LD block, suggesting that this site is unlikely to have been tagged by, or be otherwise related, to the previously associated sites. The positioning of rs does make it a potential candidate for altering regulatory sites and gene expression, however. Specifically, rs sits within a predicted binding motif for the poorly described zinc finger transcription factor, ZNF35, within the 3 UTR region of the SAT1 transcript as demonstrated by available ENCODE (157) evidence in the regulomedb (132) dataset. This variant is also predicted to be within an active transcriptional region via chromatin state assessment as part of the Roadmap Epigenetics Project (168). However, cautious interpretation of rs and its potential functional consequences is warranted without replication in additional samples and assessment in combination with expression data for SAT1. Other Explanations for Observed Expression Changes The lack of significant variant associations with attempted suicide within the SAT1 locus suggests other potential causes for the previously associated 76

94 changes in expression. Recent work by Lopez et al. (169) has suggested that these changes may be associated with altered regulation by several mirnas suspected to regulate SAT1 expression. In addition, epigenetic alterations within SAT1, dysregulation of any of the many regulatory genes or small molecular inducers of SAT1, alterations within distant trans-regulatory sites/enhancers for SAT1, and/or environmental factors could also explain differences in its expression behavior. Indeed, it was recently reported by Niola et al. (170) that expression changes noted within the SAT1 gene secondary to application of lithium within cell lines were not found to be associated with genetic variation in the SAT1 locus within these cell lines, implicating other effectors of the altered expression. Study Limitations Our study had several limitations that must be considered in the interpretation of our findings. First, though the entire gene transcript locus was covered by our sequencing probes, only part of the complete upstream LD block investigated in previous studies was captured by our design due to a focus on capturing regions with multiple lines of evidence for functionality within the ENCODE dataset. There is the possibility that variants within regions not covered by our design are relevant to the regulation of SAT1 expression and to suicidal behavior. In addition, the subjects we used differed in the suicidality phenotype from the majority of previous positive SAT1 association studies (42, 43, 150). Many of the previous variant associations were made via comparisons of suicide completers to psychiatric controls or normal controls as opposed to our attempter versus non-attempter design. We attempted to control for this discrepancy by sequencing only individuals with high suicidal intent. However, it remains to be 77

95 clarified whether all underlying genetic risk factors of those that die from suicide are shared and detectable within those who attempt suicide. CONCLUSION This study has provided a detailed examination of the SAT1 gene locus, and found evidence to suggest that genetic variation of moderate to high effect size within the SAT1 locus may not significantly contribute to attempted suicide risk. This finding also suggests that expression changes observed in SAT1 that have been associated with suicidal behavior may arise from other sources, yet to be clarified. Therefore, this study aids in directing focus toward other potential regulatory sources of the SAT1 gene and broader sequencing efforts to elucidate the source of the many associations observed in SAT1 expression and suicidal behavior. ACKNOWLEDGEMENTS This work was supported by NIH grant R01MH (Dr. Willour). Additional funds were provided by the University of Iowa Medical Scientist Training Program (MSTP) training grant 5 T32 GM (Eric Monson) and American Foundation for Suicide Prevention (AFSP) PDF (Dr. Breen). We also wish to acknowledge the support of the University of Iowa Interdisciplinary Graduate Program in Genetics (Eric Monson, Sophia Gaynor, Dr. Willour, and Dr. Potash). CONFLICTS OF INTEREST All authors declare no conflicts of interest in the funding or direction of this research. 78

96 FIGURES Figure 4.1: Representation of the SAT1 Locus Variants. Detected variants are shown as vertical bars that represent the minor allele frequency of the variant in attempters (red bars) and non-attempters (blue bars). Gene structures for the merged SAT1 transcript, incorporating all alternative transcripts of the gene, are represented by colored blocks as follows: Green = Translated exons, yellow = 5 UTR exons, black = intronic regions, grey = 3 UTR exons, dark blue = promoter region, and light blue = regulatory regions within 10kb of the largest transcript (note: the 10kb regulatory blocks are compressed by a factor of 4, thus the 10kb region is represented by a 2.5kb sized block). rs , a variant previously associated with suicidal behavior in multiple studies, is demarcated by a red star and labeled. 79

97 Figure 4.2: Representation of the LD Structure of the Sequenced Region of Variants. Variants are represented by rsnumber IDs (DBSNP version 142) and are included in the image if the minor allele was detected in at least two study subjects. Blocks of high linkage disequilibrium (LD) are represented by black borders while the strength of the LD relationship is colored from white (least relationship) to dark red (strong LD) as measured by the D / log of the odds for linkage between the two sites (D / LOD). The variant labeled in green represents the strongest single variant signal in our study while the variant labeled in red represents a previously associated promoter variant identified in multiple studies. 80

98 TABLES Table 4.1: Covered Variants Previously Associated with Suicidal Behavior and/or SAT1 Expression SNP ID (DBSNP 142) Chr Locus (hg19/grch37) Attempter MAF Non- Attempter MAF All OR Attempter Versus Non-Attempter Logistic Regression Results by Subject Group All pval Male OR Male pval Female OR rs (42, 150) X Female pval rs (150) X Chr = chromosome; OR = Odds ratio; MAF = Minor allele frequency; pval = uncorrected P-value 81

99 Table 4.2: Gene-Burden Results Assessed Region SAT1 Whole Gene Coding Broad SAT1 Whole Gene Coding Disruptive SAT1 Whole Gene Reg Broad SAT1 Whole Gene Reg Narrow SAT1 4.3 kb LD block Coding Broad SAT1 4.3 kb LD block Coding Disruptive SAT1 4.3 kb LD block Reg Broad All Odds Ratio* MAF 05 MAF 01 All Gene- Burden Pval MAF MAF Male Odds Ratio* MAF MAF Male Gene- Burden Pval MAF MAF Female Odds Ratio* MAF MAF Female Gene- Burden Pval MAF MAF SAT1 4.3 kb LD block Reg Narrow *Odds ratios are corrected for sample covariates. Any result denoted by a dash "-" represents a region that had no rare functional variation within the given population of our data sample, and thus could not be analyzed for that group. MAF= Minor allele frequency; Reg = regulatory; Pval = P-value corrected for covariates but uncorrected for multiple comparisons (uncorrected P-value < required for significance). 82

100 SUPPLEMENTAL INFORMATION Supplementary figures and tables referenced in this article (41) can be accessed on the publisher s website, 83

101 CHAPTER 5, COMPLETE PUBLISHED MANUSCRIPT, A TARGETED SEQUENCING STUDY OF GLUTAMATERGIC CANDIDATE GENES IN SUICIDE ATTEMPTERS WITH BIPOLAR DISORDER STUDY CONTRIBUTION This manuscript (52) details an investigation of 16 genes that represent the glutamatergic N-methyl-D-aspartate (NMDA) receptor gene family as well as two interacting synaptic adhesion gene families, the neurexin (NRXN) and neuroligin (NLGN) genes within SB. My roles within this manuscript are largely described within the introduction to Chapter 4, with a few notable exceptions that will be highlighted here. The goal for sequencing the 16 genes described within this manuscript differed from that of SAT1 due to these genes loci generally being much larger, overall, requiring a significant increase in project expense to fully sequence the gene loci. Under the direction of Dr. Virginia Willour and with assistance from Dr. Thomas Casavant and his group, my role was to thoughtfully identify sequence targets within and near the selected genes that would offer the greatest potential regulatory sequence yield while minimizing the number of targeted bases. To do this, I performed a literature search to identify which features would be most likely to provide useful regulatory information. Based on this information, I targeted all coding exons, untranslated exons, the first 2kb upstream of each possible transcriptional start site for all alternative transcripts of each gene, and any site with overlapping DNase hypersensitivity and transcription factor binding site evidence from merged ENCODE UCSC data tracks within 10kb of each gene locus. These targets were checked and refined over several months by our collaborative group to ensure only sites likely to be successfully sequenced and meeting our criteria would be captured. 84

102 Following sequencing of these targets, performed by my colleague, Sophia Gaynor, the analytical goal for this manuscript was to comprehensively investigate the coding and regulatory data for each gene and within the complete gene set. My role, under the guidance of Dr. Virginia Willour and Dr. Peter Zandi, was to design and implement the analytical strategy for the individual variant, gene, and complete pathway. To do this, variant and gene analyses were prepared using custom scripts as described in other chapters and Appendix B, and pathway analyses were designed following a similar pattern. Specifically, I implemented two approaches to the pathway tests: additive plink/seq BURDEN assessment and the sequence kernel association test (SKAT). These complementary approaches were selected to allow the evaluation of signals involving both damaging and protective variation. ABSTRACT Suicidal behavior has been shown to have a heritable component that is partly driven by psychiatric disorders (171). However, there is also an independent factor contributing to the heritability of suicidal behavior. We previously conducted a genome-wide association study (GWAS) of bipolar suicide attempters and bipolar non-attempters to assess this independent factor (32). This GWAS implicated glutamatergic neurotransmission in attempted suicide. In the current study, we have conducted a targeted next-generation sequencing study of the glutamatergic N-methyl-D-aspartate (NMDA) receptor, neurexin, and neuroligin gene families in 476 bipolar suicide attempters and 473 bipolar non-attempters. The goal of this study was to gather sequence information from coding and regulatory regions of these glutamatergic genes to identify variants associated with attempted suicide. We identified 186 coding variants and 4,298 regulatory variants predicted to be functional in these genes. 85

103 No individual variants were overrepresented in cases or controls to a degree that was statistically significant after correction for multiple testing. Additionally, none of the gene-level results were statistically significant following correction. While this study provides no direct support for a role of the examined glutamatergic candidate genes, further sequencing in expanded gene sets and datasets will be required to ultimately determine whether genetic variation in glutamatergic signaling influences suicidal behavior. INTRODUCTION Completed suicide is the tenth leading cause of death among all age groups in the United States and the second leading cause of death in year olds (172, 173). There are an additional 25 suicide attempts for each completed suicide, making suicidal behavior a common and devastating phenotype (172). Epidemiological studies have consistently shown that the risk for suicidal behavior is heritable, with the current heritability estimates ranging from 30% to 50% (28). This heritability is comprised of two main factors. The first comes from a liability to psychiatric disorders (171). The second genetic factor is independent and may span multiple psychiatric disorders. This independent factor is currently hypothesized to be impulsive aggression, suggesting that the highest risk individuals have both this trait and a psychiatric disorder (28, 171). The earliest studies searching for genetic susceptibility loci in suicidal behavior used candidate gene approaches, many of which focused on the serotonergic system (120). However, when these studies failed to produce consistent results, multiple large-scale studies, including linkage analyses, and genome-wide association studies (GWAS), were done to identify genetic contributors to suicidal behavior (32, 111, 113, 114, 116, 174, 175). These studies have provided evidence for a role of glutamatergic signaling in the risk for 86

104 such behavior (32, 111, 175). Additionally, lithium and ketamine, two drugs effective in treating suicidal behavior, are hypothesized to affect glutamatergic signaling. Both long-term lithium treatment and acute ketamine treatment result in the attenuation of glutamatergic signaling (Fig. 5.1) (50, 51). The involvement of these medications in treating suicidal behavior further supports the hypothesis that glutamatergic signaling has a role in this phenotype. Mammals contain three neurexin (NRXN) genes, five neuroligin (NLGN) genes, and eight (NMDA) receptor genes (see materials and methods for complete gene list) ( ). All three gene families are located at the synapse where they play a role in glutamatergic signaling. Presynaptic NRXN proteins and postsynaptic NLGN proteins form a cell adhesion complex involved in synapse stabilization and specification (177). In addition, NLGN proteins interact with NMDA receptors, a type of ionotropic glutamate receptor, through scaffolding protein complexes (Fig. 5.1) (179). In this study, we present a targeted sequencing assessment of these three glutamatergic gene families in 476 bipolar suicide attempters and 473 bipolar non-attempters. This study provides a novel and in-depth examination of sequence variation in a set of candidate genes with the goal of identifying additional genetic susceptibility loci for suicidal behavior. MATERIALS AND METHODS Sample Collection Subject information was collected as part of the National Institute of Mental Health (NIMH) Genetics Initiative, (180). For this study, we used subjects from our attempted suicide GWAS, which included individuals of European-American descent (Table 5.1) (32). All subjects were unrelated and with a diagnosis of bipolar I (BPI) or schizoaffective disorder, 87

105 bipolar type (SABP). Attempters were defined by their positive answer to the question in the Diagnostic Interview for Genetic Studies (DIGS): Have you ever tried to kill yourself? (122) We selected only attempters that indicated they had definite or serious intent to die as determined by the DIGS questionnaire. A total of 476 suicide attempters (224 males and 252 females) and 476 non-attempters (224 males and 252 females) were included for sequencing. Institutional Review Board-approved written informed consent was provided by all subjects prior to enrollment. Target Determination Our target regions were derived from the following genes: GRIN1, GRIN2A, GRIN2B, GRIN2C, GRIN2D, GRIN3A, GRIN3B, GRINA, NRXN1, NRXN2, NRXN3, NLGN1, NLGN2, NLGN3, NLGN4X, and NLGN4Y. For each of these genes, both coding and non-coding (regulatory) regions were included in the sequencing target. The coding regions consisted of all exons of all alternative transcripts defined by Ensembl, GENCODE, RefSeq, or the UCSC Genes track of the UCSC Genome Browser ( ). The regulatory genomic regions were comprised of all alternative promoter regions (defined as 2 kb upstream of the transcription start site), intron-exon boundaries (±50 bp), and any putative ENCODE regulatory elements (157). These regulatory elements included any genomic regions, either intronic or within a 10 kb flanking sequence of each gene, that were defined by ENCODE as both a transcription factor binding site and a DNase hypersensitivity site (UCSC Genome Browser tables wgencoderegdnaseclusteredv2 and wgencoderegtfbsclusteredv2) (185). Sequencing probes for these regions were designed using the SureDesign Custom Design Tool software from Agilent (Santa Clara, CA). 88

106 Next Generation Sequencing We used Agilent's SureSelectXT Target Enrichment next-generation sequencing technology to obtain sequencing data for these 16 genes. To start, 3 µg of high quality genomic DNA was sheared using the Covaris (Woburn, MA) E220 ultrasonicator. Library preparation (end repair, A-tail addition, and adapter ligation), hybridization, and target selection were carried out using standard protocols on the Sciclone Next-Generation Sequencing (NGS) Robotics Workstation (PerkinElmer: Waltham, MA) at the Iowa Institute of Human Genetics Genomics Division. The final prepped samples were run on an Illumina (San Diego, CA) HiSeq 2000 in multiplexed pools of 16 samples per lane. The 100 bp paired-end reads were aligned to the hg19 reference human genome using the Burrows-Wheeler Aligner (BWA: v0.6.2) (128). SAMtools (v0.1.18) (129), BAMtools (v2.2.3) (130), and Picard (v1.88; were used for indexing and duplicate removal. The Genome Analysis Tool Kit (GATK: v3.1.1) was used to determine quality scores, coverage depth, and genotypes (85). The GATK HaplotypeCaller was used to call indels and single nucleotide polymorphisms (SNPs) (85). All samples were called as a single group. Quality control tests checking for quality-to-depth ratio, mapping quality, strand bias, and haplotype score were run for each variant via GATK, and any variants that failed these quality control tests were removed. We also removed variants that were missing calls in >10% of subjects or that failed Hardy Weinberg equilibrium (P < ). Variants that had a depth of coverage <10X (supplementary Table SI) or a genotyping quality score <20 were set to missing, as were any heterozygous calls in the X chromosome of male subjects (excluding the pseudoautosomal region) or Y chromosome calls in female subjects. We only included variants with a minor allele frequency (MAF) 0.05 in 89

107 our dataset, the European 1000 Genomes dataset (139), the Exome Aggregation Consortium dataset (Cambridge, MA: and the NHLBI GO Exome Sequencing Project dataset (Seattle, WA: We also compared subject calls to our existing attempted suicide GWAS data and removed three non-attempter subjects with high levels of genotype mismatch, making our final number of subjects 476 bipolar suicide attempters and 473 bipolar non-attempters (32). Overall, the variant calls in this study had a 99.75% concordance rate with our GWAS results. Finally, we examined the 1000 Genomes SNPs covered in our study (186). We covered 96.1% (317 out of 330) of the 1000 Genomes SNPs with a MAF between 0.01 and For SNPs with a MAF > 0.05, we covered 98.5% (870 out of 883) of the 1000 Genomes SNPs. Statistical Analyses We performed both individual-variant and gene-level tests. The individualvariant tests were done using logistic regression with Firth's correction method (187) within the logistf R package (v1.21) (134). The individual-variant tests were corrected for sex and the first three principal components from our attempted suicide GWAS as covariates (32). Because alcohol dependence and duration of illness differed significantly between our attempters and nonattempters (Table 5.1), we also ran secondary analyses in which alcohol dependence and duration of illness were included as covariates. However, these analyses did not significantly affect the top findings from our primary analysis (data not shown). Because we examined 5,209 variants with MAF 0.05 (supplementary Tables SII and SIII), the threshold for study-wide significance following the conservative Bonferroni correction for multiple testing was For a less conservative multiple testing correction, we also used 90

108 Haploview (v4.2) (164) with the confidence intervals method (165) to determine the number of linkage disequilibrium (LD) blocks within our 16 genes. We identified a total of 118 LD blocks across our genes, making the threshold for study-wide significance (when corrected for the number of LD blocks rather than the number of variants) We also performed gene-level assessments for each of the 16 genes examined. These gene-level assessments were run separately on coding regions and regulatory regions. We also looked at two different groups of variants classified as disruptive and broad (79). The disruptive variants for coding regions included any frameshift, stopgain/nonsense, or essential splice site variants identified by ANNOVAR (86). Coding variants that received a broad classification included all disruptive coding variants and any additional non-synonymous variants that were predicted to be damaging by at least one of six bioinformatics packages (SIFT (159), Polyphen2 HVAR (160), Polyphen2 HDIV (160), MutationTaster (162), LRT (161), VEST (163)). For the regulatory regions, the disruptive classification included any variants that received a score of 1 or 2 from the Regulome database (v1.1), meaning that they are likely to interfere with the binding of regulatory proteins (132). The broad classification for regulatory regions included all regulatory disruptive variants plus any variants that received a score of 3 6 from the Regulome database (132). The gene-level assessment of these variant groups included two types of tests: a gene-burden test and a sequence kernel association test (SKAT: v1.0.9) (135). The gene-burden test was done using a previously described collapsing method (133) to collapse rare, functional variants by subject over each gene. Subjects that have one or more rare, functional variants within the assessed 91

109 gene are given a burden value of 1 for that gene, while subjects with no rare, functional variants are given a burden value of 0 for that gene. The resulting burden values were assessed using the logistf R package (v1.21) (134) based on logistic regression with Firth's correction method (187) to determine whether the presence of variant burden across the gene was associated with attempters or non-attempters. SKAT was a complementary gene-level approach run specifically on autosomal chromosomes (i.e., all genes except NLGN3, NLGN4X, and NLGN4Y). SKAT uses a multivariate regression model that takes the directionality of variant effect into account. The gene-level tests were corrected for sex and the first three principal components from our attempted suicide GWAS as covariates (32). We also ran secondary analyses in which alcohol dependence and duration of illness were included as covariates. However, these analyses did not significantly affect the top findings from our primary analysis (data not shown). Because we ran a total of eight tests per gene, the study-wide significance threshold for the gene-level tests was Pathway analyses were also done on the set of 16 genes using PLINK/SEQ (v0.10) (158) and SKAT. PLINK/SEQ was done using the burden method with 10,000 permutations. SKAT analyses were done using the davies model for the regression framework of the autosomal genes within our gene set. A total of eight tests were run for the pathway analysis, so the threshold for study-wide significance was The PLINK/SEQ analysis was corrected for sex, and the SKAT analysis was corrected for sex and the first three principal components from our attempted suicide GWAS as covariates (32). RESULTS We examined the NMDA receptor, NRXN, and NLGN gene families using both individual-variant and gene-level tests. The individual-variant tests look for 92

110 specific variants that are significantly more common in attempters or nonattempters, while the gene-level tests look for an enrichment of variants in a given gene in attempters or non-attempters. We identified a total of 5,209 variants (supplementary Table SIII), of which 46 reached nominal significance (P < 0.05) for association with suicide. Of these variants, 186 coding variants and 4,298 regulatory variants were predicted to be functional. While our Q-Q plot shows that common variants do follow the expected distribution (Supplementary Fig. S1), variants with a MAF 0.05 do not follow the expected distribution, as our sample size was underpowered to detect significant associations for rare variants. The top individual-variant test result was rs , a nonsynonymous exonic variant in the GRIN3A gene (Table 5.2; P = , odds ratio (OR) = 19.21). Because past research has demonstrated sex-specific differences in suicidal behavior, we also performed both male- and female-specific individual-variant tests. In the male-specific analysis, 23 variants reached nominal significance (supplementary Table SIV). The top male-specific variant was rs , an intronic variant in GRIN3A (P = , OR = 3.56). In the female-specific analysis, 25 variants reached nominal significance (supplementary Table SV). The top female-specific variant was rs , an intronic variant in NRXN2 (P = , OR = 0.063). We also examined the NMDA receptor, NRXN, and NLGN gene families using a gene-level approach. The top result for the gene-level tests was the GRIN3A gene (Table 5.3, supplementary Tables SVI and SVII). This top finding came from the SKAT test of broad coding variants and produced a P-value of As was done for the individual-variant tests, we performed male- and female-specific gene-level tests. The top result for the male-specific gene-level tests came from the SKAT test of broad coding variants in the NLGN1 gene 93

111 (Table 5.3, supplementary Tables SVIII and SIX; P = ). For the femalespecific tests, the top result was from the SKAT test of broad coding variants in GRIN3A (Table 5.3, supplementary Tables SX and SXI; P = ). No genelevel results remained significant following correction for multiple testing. As a final test, we ran a pathway analysis to determine whether this set of 16 genes was more enriched for variants than would be expected by chance. However, this pathway analysis provided no evidence for an enrichment of variants within our gene set (supplementary Table SXII). DISCUSSION Glutamate is the main excitatory neurotransmitter in the brain. When released from the presynaptic neuron, glutamate binds to receptors like NMDA receptors to propagate the excitatory signal to the postsynaptic neuron. NMDA receptors are a class of ionotropic glutamate receptors that have been implicated in synaptic plasticity, learning, and memory (179, 188). They are heterotetrameric proteins typically composed of two compulsory GRIN1 subunits linked to a combination of GRIN2 (A-D) or GRIN3 (A-B) subunits (189), potentially alongside a glutamate-binding protein, GRINA (190). NMDA receptors have been implicated in impulsivity and aggression (191, 192), and three NMDA receptor genes (GRIN1, GRIN2C, GRINA) have altered expression levels in postmortem brains of suicide completers (193). These features make NMDA receptors and their interacting partners, such as the NRXN and NLGN gene families, prime candidates for involvement in the impulsive aggression phenotype and suicidal behavior. Interestingly, two NLGN genes (NLGN1, NLGN4Y) and two NRXN genes (NRXN1, NRXN3) have altered expression levels in postmortem brains of suicide completers, suggesting that these genes may play a role in suicide etiology (193). 94

112 Based on the roles of the NRXN, NLGN, and NMDA receptor proteins in glutamatergic synapse maintenance (194) and their implication in multiple psychiatric disorders ( ), as well as in suicidal behavior (32, 198), we decided to perform a targeted sequencing study to look for variants in these genes that might be associated with suicidal behavior. However, no results from this study achieved study-wide statistical significance following correction for multiple testing. One potential reason for the lack of significant results is that we did not examine all genes involved in glutamatergic signaling. We chose to look at the NRXN, NLGN, and NMDA receptor gene families due to their interactions with one another and previous evidence implicating them in psychiatric disorders. However, there are additional families of glutamatergic genes that were not examined through this approach. These additional gene families include other glutamate receptors such as AMPA receptors, metabotropic receptors, and kainate receptors, and several genes encoding these types of receptors are differentially expressed (198) or differentially methylated (175) in suicidal behavior. Future studies should examine sequence variation in these other glutamatergic gene families to fully understand the relationship between glutamatergic signaling and suicidal behavior. These studies should also investigate how genetic variation in glutamatergic genes alters the effects of ketamine and lithium on suicidal behavior. While we did not examine all possible glutamatergic genes, several of the genes we did sequence had previous evidence for involvement in suicidal behavior that we were unable to replicate in this study. A GWAS conducted by our lab had previously implicated NRXN3 in attempted suicide among females (32). While several of our top female-specific variants occurred in NRXN3 (supplementary Table SV), none of these variants survived correction for multiple 95

113 testing nor did NRXN3 occur among our top female-specific gene-level results. GWAS examines common variants, while this study focused specifically on rare variants (MAF 0.05). Thus, the genetic contribution of NRXN3 to the risk of attempted suicide in the female population could be due mainly to common variation. Studies in larger sample sets examining both common and rare variation are needed to determine the role of NRXN3 in attempted suicide. Previous evidence also suggested an association between GRIN2B and suicidal behavior. Sokolowski et al. (166) performed a family-based study of candidate SNPs and found a novel association between two GRIN2B SNPs and suicidal behavior (166). They also discovered a specific four-snp GRIN2B haplotype that was associated with a higher risk for suicide. In our study, GRIN2B was the second best overall gene-level finding (P = 0.011, OR = 6.43) but did not appear among any of our top individual-variant results. Again, while Sokolowski et al. looked at common SNPs, our study focused on rare variants. It is of interest, however, that the variants that make up the risk haplotype identified by Sokolowski et al. localize to the same linkage disequilibrium block as the majority of the variants contributing to our gene-level GRIN2B finding. Future studies should aim to examine both common and rare GRIN2B SNPs in independent sample sets. The rationale for these future studies is also enhanced by the recent finding that expression of GRIN2B is increased in suicide attempters with major depressive disorder, particularly in females (198). Our analytic strategy also allowed us to test for evidence of rare-burden association across the entire collection of candidate genes. To do this, we conducted a pathway analysis that tested for rare-variant gene-burden in attempters versus non-attempters. Pathway analyses are especially valuable for complex phenotypes like suicidal behavior, because risk variants may be spread 96

114 across multiple interacting genes rather than localized within a single candidate gene. The best result from our pathway analysis came from the SKAT test of disruptive coding variants (P = 0.085). While this result showed a trend towards significance, larger sample sizes are needed to determine whether this trend represents a true increase in the risk for suicidal behavior conferred by rare variation within our candidate genes. There are several limitations in our study. The first is that our sample size, while substantial, was only large enough to detect a genotypic relative risk 2.6 with 80% power and a MAF of 0.05 at study-wide significance for the individualvariant tests (calculated using Quanto v1.2.4; For the gene-level tests, we could detect a genotypic relative risk 2.2 with 80% power and a MAF of 0.05 at study-wide significance. This power was likely further reduced, because not all variants in a given locus affect the risk for suicide. We note that samples with greater than 2,000 subjects will likely be needed to detect the kind of modest effects (genotype relative risk 1.5) likely to be present in rare suicidal behavior risk genotypes. Another inherent limitation with our study design is that we used a targeted sequencing approach. We designed this study with the goal of sequencing regions that were most likely to affect gene function, including both coding exons and potential regulatory regions. The way in which we defined these regions, however, may have resulted in the exclusion of additional regulatory genomic regions. The third limitation of our study design is that we took a candidate gene approach. This means that we may have missed variation in other glutamatergic genes that significantly contributes to the phenotype of suicidal behavior. A final limitation of our study is that individuals classified as non-attempters at the time of interview could potentially attempt suicide later in 97

115 life. Our study design favors those who attempt earlier in life, potentially biasing our study towards early-onset attempters. Here we have presented a targeted sequencing study of the NMDA receptor, NRXN, and NLGN gene families in suicide attempters and nonattempters. While none of our results reached study-wide statistical significance, further sequencing in expanded gene sets and datasets will be necessary to ultimately determine whether genetic variation in glutamatergic signaling influences suicidal behavior. ACKNOWLEDGMENTS This study was funded by NIH grant R01MH (Dr. Willour). Other funding sources included the American Foundation for Suicide Prevention (AFSP) PDF (Dr. Breen), the University of Iowa Medical Scientist Training Program (MSTP) training grant 5 T32 GM (Eric Monson), and the University of Iowa Interdisciplinary Graduate Program in Genetics (Sophia Gaynor, Eric Monson, Dr. Willour, and Dr. Potash). CONFLICTS OF INTEREST None. 98

116 FIGURES Figure 5.1: Effect of Drug Treatments on Glutamatergic Signaling. Both chronic lithium treatment and acute ketamine treatment inhibit NMDA receptors (shown in purple). These NMDA receptors interact with postsynaptic NLGN proteins (shown in green) through scaffolding protein complexes. These NLGN proteins interact with presynaptic NRXN proteins (shown in blue) to stabilize the synapse and determine synapse specificity. 99

117 TABLES Table 5.1: Sample Set Demographics Sex (%) Age Suicide attempters Non-attempters Male 224 (47.06%) 222 (46.93%) Female 252 (52.94%) 251 (53.07%) Average (Min-Max) (19-82) (18-88) a Ethnicity European-American b Diagnosis BP I SABP 1 0 Alcohol Dependence 194 (40.76%) c 148 (31.29%) Duration of Illness 25.6 years d 22.6 years Total BP I, bipolar I; SABP, schizoaffective disorder, bipolar type All subjects are unrelated. a One subject did not provide age information. b A principal component analysis demonstrated that all subjects were consistent with having a European-American background. c Attempters had significantly higher incidence of alcohol dependence (P = ). d Attempters had significantly longer duration of illness (P = ). 100

118 Table 5.2: Top Individual-Variant Results Gene Chromosome Location a,b Position dbsnp142 c P-value d Odds Ratio e Attempter MAF Non-Attempter MAF GRIN3A chr9: (T/C) exonic rs GRIN2A chr16: (A/T) ncrna_ exonic NRXN3 chr14: (CT/C) upstream NULL (2.43-2,480.1) rs ( ) (2.16-2,260.6) NLGN4X chrx: (G/C) intronic rs ( ) NRXN3 chr14: (G/C) intronic rs NRXN3 chr14: (G/T) intronic rs ( ) ( ) NRXN3 chr14: (C/T) intronic rs ( ) NRXN3 chr14: (C/A) intronic rs ( ) NLGN1 chr3: (A/G) intronic rs ( ) NLGN1 chr3: (A/G) intronic rs ( ) MAF, minor allele frequency; ncrna, non-coding RNA Results are shown for our sample set of 476 bipolar suicide attempters and 473 bipolar non-attempters. a Chromosome location was determined using the UCSC Genome Browser (hg19). b Variants are listed as (major allele/minor allele). c "NULL" designates a novel variant. d Uncorrected P-values are shown. No P-values survived correction for multiple testing. e Odds ratios shown are for the minor allele. Numbers in parentheses indicate the 95% confidence interval for the odds ratio. 101

119 Table 5.3: Gene-Level Results (P < 0.05) Gene Sample Group Test Variant Type Genomic Region Number of Variants Assessed a P- Value b Odds Ratio c Attempter Frequency d Non- Attempter Frequency d GRIN3A All SKAT Broad Coding NULL GRIN2B All Broad Coding ( ) GRIN1 All SKAT Disruptive Regulatory NULL GRIN3B All Disruptive Coding (1.24-1,442.4) NLGN1 Male SKAT Broad Coding NULL GRIN3A Male SKAT Broad Regulatory NULL NLGN1 Male Geneburden Geneburden Geneburden Disruptive Regulatory ( ) NRXN1 Male SKAT Broad Regulatory NULL GRIN3A Female SKAT Broad Coding NULL SKAT, sequence kernel association test Results are shown for our sample set of 476 bipolar suicide attempters (224 males and 252 females) and 473 bipolar nonattempters (222 males and 251 females). a Denotes the number of unique variant sites included in the assessment for a given result. b Uncorrected P-values are shown. No P-values survived correction for multiple testing. c SKAT analyses do not generate odds ratios. Numbers in parentheses indicate the 95% confidence interval for the odds ratio. d Indicates how often variants occurred within each gene in attempters and non-attempters. 102

120 SUPPLEMENTAL INFORMATION Supplementary figures and tables referenced in this article (52) can be accessed on the publisher s website, 103

121 CHAPTER 6, COMPLETE PUBLISHED MANUSCRIPT, TARGETED SEQUENCING OF FKBP5 IN SUICIDE ATTEMPTERS WITH BIPOLAR DISORDER STUDY CONTRIBUTION The focus of this manuscript (53) was centered on performing the most comprehensive genetic assessment of the FK506 binding protein 5 (FKBP5), an SB candidate gene that has received considerable attention. The contribution I made to this manuscript is largely described within the contribution section of chapter 4 and Appendix B. My role within our collaborative team, led by Dr. Virginia Willour, was to design the methods to select subjects and targets that met our criteria for sequencing the FKBP5 locus, and preparing/implementing statistical analyses to assess and attempt to replicate previous FKBP5 associations with SB. Differences in the analyses specific to this manuscript will be described here. The first unique goal for the FKBP5 study was focused on the investigation of the linkage disequilibrium (LD) structure of the gene due to several past haplotype associations within SB. My role in this investigation was to identify and apply a suitable method for identifying and testing the association of haplotypes within the sequenced FKBP5 locus. To do this, I generated a custom script to format the sequence data obtained for the FKBP5 gene to a plink v1.07 compatible dataset. The plink v1.07 recodehv module was then used to generate linkage format data files to be processed within the program, Haploview v4.2. Haploview v4.2 was then used to identify LD blocks based on our sequencing data across the FKBP5 locus and assess common haplotypes within these blocks using built-in association testing along with permutation-based correction for multiple testing. 104

122 An additional goal of the manuscript was to test variants of interest within our complete BD sample that included over 1,000 additional subjects that were not sequenced. My role, under the guidance of Dr. Virginia Willour, was to prepare replication and mega-analyses for these variants, genotyped in the additional samples by Dr. Marie Breen. To do this, I prepared scripts to combine our sequencing data with SNP genotyping results obtained for the remaining samples and assess overall as well as sex-specific associations within a covariate-corrected logistic regression model in separate replication and mega analyses. ABSTRACT FKBP5 is a critical component of the Hypothalamic-Pituitary-Adrenal (HPA) axis, a system which regulates our response to stress. It forms part of a complex of chaperones, which inhibits binding of cortisol and glucocorticoid receptor translocation to the nucleus. Variations in both the HPA axis and FKBP5 have been associated with suicidal behavior. We developed a systematic, targeted sequencing approach to investigate coding and regulatory regions in or near FKBP5 in 476 bipolar disorder suicide attempters and 473 bipolar disorder non-attempters. Following stringent quality control checks, we performed singlevariant, gene-level and haplotype tests on the resulting 481 variants. Secondary analyses investigated whether sex-specific variations in FKBP5 increased the risk of attempted suicide. One variant, rs , showed an excess of minor alleles in suicide attempters that was statistically significant following correction for multiple testing (Odds Ratio=6.65, P-value=7.5 x 10-4, Permuted P- value=0.038). However, this result could not be replicated in an independent cohort (Odds Ratio=0.90, P-value=0.78). Three female-specific and four malespecific variants of nominal significance were also identified (P-value < 0.05). 105

123 The gene-level and haplotype association tests did not produce any significant results. This comprehensive study of common and rare variants in FKBP5 focused on both regulatory and coding regions in relation to attempted suicide. One rare variant remained significant following correction for multiple testing but could not be replicated. Further investigation is required in larger sample sets to fully elucidate the association of this variant with suicidal behavior. INTRODUCTION Every hour throughout the world, approximately ninety-one people die by suicide and reports suggest suicide attempts are over twenty times higher (26). The complexity of attempted and completed suicide, collectively termed suicidal behavior, is due to the combination of biological, behavioral and environmental factors that can increase the risk of this phenotype (199). One of the most common risk factors is the presence of a psychiatric disorder, predominantly mood disorders such as bipolar disorder, with almost 90% of suicide attempters suffering from a preceding psychiatric condition (200, 201). One current hypothesis in the field of suicide genetics is that suicidal behavior is associated with an altered stress response. The Hypothalamic- Pituitary-Adrenal (HPA) axis is a tightly controlled collection of interacting proteins that help regulate our response to stress. Dysregulation of the HPA axis causes an abnormal stress response and has been linked to suicidal behavior ( ). One critical component of the HPA axis is the co-chaperone FK506 Binding Protein 5 (FKBP5; ENSG ). FKBP5 binds to the glucocorticoid receptor (GR) as part of a complex of chaperones. This complex inhibits binding of cortisol and receptor translocation to the nucleus (205). Due to its crucial role in the HPA axis, specific focus has been placed on the FKBP5 gene and variations within it that might be associated with suicidal behavior. A 106

124 family-based study found an association with FKBP5 and bipolar disorder, and a secondary analysis linked this to a history of attempted suicide (54). European (55, 57) and Japanese (56) sample cohorts with suicidal behavior (ranging from suicidal ideation to attempt and completed suicide) have also been found to show associations with FKBP5. Adding more complexity, it has been suggested that FKBP5 variants may interact with childhood trauma to influence suicidal behavior (206, 207). The presence of regulatory variation within FKBP5 may be functionally relevant as gene and protein expression studies have found altered FKBP5 levels in suicide completers compared to controls (208). In contrast, genome-wide studies have not shown any association between FKBP5 and suicidal behavior (31-35, 117, 118, ). The conflicting nature of these results suggests that a more focused interrogation of this gene including less commonly investigated regions and less common variants, may identify functionally relevant variation. Therefore, in this study, we have sequenced both coding regions and regulatory regions to identify genetic variation in FKBP5 that may increase the risk for suicidal behavior in individuals with bipolar disorder. Following stringent quality control checks, we performed single-variant, gene-level and haplotype tests on the resulting 481 variants. One rare variant remained significant following correction for multiple testing but could not be replicated. Further investigation is required in larger sample sets to fully elucidate the association of this variant with suicidal behavior. MATERIALS AND METHODS Ethics Statement Samples were obtained from the National Institute of Mental Health (NIMH) Genetics Initiative, ( supplementary S1 107

125 File) (180) and distributed through Rutgers University s RUCDR Infinite Biologics ( The University of Iowa Institutional Review Board considers this project to be not human subject research because these repository samples were sent to the University of Iowa s laboratory in a de-identified manner. Subjects For this study, we utilized the same sample set as in our attempted suicide genome-wide association study (GWAS) (32). All subjects included in the analysis were European-American and unrelated. Phenotype data was collected using the Diagnostic Interview for Genetic Studies (DIGS; versions (122)) and diagnoses ascertained as bipolar disorder, type I (BPI) or Schizoaffective Disorder, Bipolar Type (SABP). There was no significant difference in average age between suicide attempters and non-attempters (supplementary S1 Table). Definition of Suicide Subjects Case subjects were defined as suicide attempters if they answered yes to the DIGS question: Have you ever tried to kill yourself? and had definite or serious intent to complete suicide. This resulted in a total of 476 suicide attempters and 476 non-attempters for sequencing. Population Stratification Principal Component Analysis (PCA) was performed using EIGENSTRAT in EIGENSOFT v3.0 (89) on our entire attempted suicide GWAS (32), and these results were included as covariates in the logistic regression analyses. Target Determination The UCSC Genome browser databases (UCSC Genes (181), GenCODE Genes (182), Ensembl Genes (184), RefSeq Genes (183) and ENCODE databases (157)) were used to define coding and regulatory regions in or near 108

126 FKBP5 transcripts. Targets included: (i) all exons (±50 bases) from all available transcripts (generated from the previously discussed UCSC Genome browser databases), (ii) predicted DNase hypersensitivity sites with at least one overlapping transcription factor binding site within all FKBP5 transcripts (±10 kb) using ENCODE and (iii) promoter regions (determined as 2 kb upstream of all the transcript start sites). Coding regions encompass any translated exons while regulatory regions include all other loci covered. SureSelect Technology We used a customized SureSelect Target Enrichment System (Agilent Technologies, Santa Clara, CA, U.S.A.) to capture and sequence our chosen targets (as previously described in (52)). Briefly, 3 µg genomic DNA was sheared, prepared into a library using adaptors and hybridized to biotinylated RNA baits. Pre-designed DNA targets were collected using magnetic streptavidin beads and sequenced as 100 bp paired-end reads using the HiSeq2000 at the Iowa Institute of Human Genetics Genomics Division in pools of sixteen samples. Sample Processing Pipeline All samples were processed as a group. Samples were aligned to the Human Feb Assembly (GRCh37/hg19) using the Burrows-Wheeler Aligner (BWA) tool v0.6.2 (128) and a 30 base seed length. Using SamTools v (129), aligned files were converted to binary (BAM) files and then sorted and indexed. Unpaired, improperly paired and unmapped reads were excluded using BamTools v2.2.3 (129) and duplicate reads removed using Picard v1.88 ( Further BamTool filtering included a mapping quality score > 20. Genome analysis toolkit (GATK) v3.1.1 (85) best practices were used to perform such tasks as realignment around insertion/deletions (indels), base score recalibration, variant calling using Haplotypecaller, and initial 109

127 quality checks (such as strand bias and depth ratio). Any single nucleotide polymorphism (SNP) or indel which failed these checks was excluded. Variant calls were labeled as missing if they had a call depth < 10, or a genotyping quality Phred score < 20. Variants were excluded from the final dataset if they had missing calls in > 10% of subjects or a Hardy-Weinberg P-value < 1 x Following a comparison to our previous GWAS genotypes, three non-attempters were excluded before analysis due to mismatches, resulting in a final total of 473 non-attempters for analysis. All variants were annotated using Ensembl v75 datatracks via ANNOVAR (vmarch 22, 2015) (86). Statistical Analyses On a single-variant level, we tested 481 FKBP5 variants for an association with attempted suicide using a logistic regression test with Firth s penalized maximum likelihood method via the logistf R package v1.21 (134). On the gene level, we assessed the association of variation across the FKBP5 locus with suicidal behavior, with two minor allele frequency (MAF) thresholds ( 1% and 5%). We determined these thresholds from our own dataset as well as frequency data from the 1000 Genomes Project European samples voct2014 (186), Exome Aggregation Consortium non-finnish European samples (ExAC v0.3 and the NHLBI GO Exome Sequencing Project European samples ( We used two complimentary methods to test for associations with variation in the FKBP5 locus. First, we performed a SNP-set (Sequence) Kernel Association Test (SKAT) v1.0.9 (135), which employs a multivariate regression model that accommodates both risk and protective alleles in the tests of association. Second, we implemented a gene-burden test utilizing the previously described Combined Multivariate and Collapsing (CMC) method (133) to collapse all 110

128 functional FKBP5 variants by subject, and Firth s penalized logistic regression via the logistf R package v1.21 (134) to test for association. Odds ratios were calculated from the covariate-corrected logistic regression output. The gene-level analyses examined either coding or regulatory variants. Broad coding variants included any disruptive variants and non-synonymous variants predicted to be damaging by any one of six software tools (SIFT (159), Polymorphism Phenotyping v2 (Polyphen-2) HDIV, Polyphen-2 HVAR (160), LRT (161), VEST (163) or MutationTaster (162)). Disruptive coding variants included any that are considered by ANNOVAR to be frameshift, stop-gain/nonsense or essential splice site mutations. Regulatory variants in the gene-level analyses were defined by their predicted ability to alter a regulatory region as determined by RegulomeDB v1.1 (132). Specifically, broad regulatory variants had a RegulomeDB score of 1-6 and narrow regulatory variants had a more stringent RegulomeDB score of 1 or 2. Sex and the first three principal components from the analysis of the GWAS data were included as covariates in all primary analyses. Statistical significance was calculated using permutation testing. We performed 10,000 permutations over the suicide attempter status using custom scripts. Genotyping Replication of results were performed on bipolar disorder suicide attempters with definite/serious intent (N = 328) and bipolar disorder nonattempters (N = 655; supplementary S2 Table), which were obtained from the NIMH Genetics Initiative, ( (180) and did not overlap with the initial sample set. Due to repetitive and homopolymer regions surrounding the top result, rs , we used an RNase H-dependent (rhpcr) method (Integrated DNA 111

129 Technologies, Coralville, IA, U.S.A.) to genotype this variant in the replication cohort. Briefly, we started with 20 ng genomic DNA and added SYBR GreenER qpcr SuperMix Universal (including ROX dye; Thermo Fisher Scientific, Waltham, MA, U.S.A.), specifically designed GEN2 primers for either the A allele or the C allele, and RNase H2 enzyme (Integrated DNA Technologies, Coralville, IA, U.S.A.). Real-time PCR was then performed on the ViiA7 system, and crossing threshold (Ct) values were used to identify sample genotypes. The most commonly investigated FKBP5 variant, rs , was genotyped in the replication cohort using a Taqman SNP genotyping assay (C_ _10; Thermo Fisher Scientific, Waltham, MA, U.S.A.). Haplotype Analysis To identify common haplotypes we utilized Haploview v4.2 (164). Default values were employed except for the minimum MAF which was set at Blocks were defined using confidence intervals (165). RESULTS Dataset Quality Following sequencing, 96.65% of the targeted sites in the FKBP5 loci had at least 10X coverage and 100% of targets had at least 1X coverage (supplementary S1 Table). A total of 40,922 bases were covered by the designed probes and 565 variants were detected from this. The GATK quality control filters excluded 24 variants and our stringent quality control checks excluded a further 60 variants resulting in a final high quality dataset of 481 variants. This final dataset included 13 coding variants, 217 novel variants (not found in dbsnp 142 (214) or other sequencing databases) and 35 common variants (MAF > 0.05). 112

130 Single-Variant Analysis Our primary analysis focused on 481 variants within or near FKBP5 (supplementary S3 Table), and five of them were differentially represented among suicide attempters and non-attempters at a nominally significant level (Table 6.1). One variant, rs , showed an excess of minor alleles in suicide attempters that was statistically significant following correction for multiple testing (Odds Ratio = 6.65, P-value = 7.5 x 10-4, Permuted P-value = 0.038). This variant resides in the intron or 3 UTR of multiple transcripts (Fig 6.1). The minor allele was present in sixteen suicide attempters (MAF=0.017) and two nonattempters (MAF=0.002). Replication of Results To further examine the top finding, rs , we attempted to replicate the significant association with suicidal behavior in an additional cohort of 328 bipolar disorder suicide attempters and 655 bipolar disorder nonattempters. In this sample we observed five suicide attempters with the minor allele (MAF=0.015) and ten non-attempters who carried it (MAF=0.015). Thus, we failed to replicate our initial finding (Odds Ratio=0.90, P-value=0.78). Replication of Prior Findings The two most implicated FKBP5 variants in suicide are rs and rs (Fig 6.1). This study did not assess rs , but it was covered in our previous attempted suicide GWAS and was not significantly associated with the phenotype there (32, 142). In comparison, rs was assessed in our initial sample, and was not found to significantly differ between suicide attempters and non-attempters (MAF in suicide attempters=0.28, MAF in non-attempters=0.29, Odds Ratio=0.94, P- value=0.52, Permuted P-value=1.00). Similarly, it was genotyped in the 113

131 additional cohort, and again the results were not significant (MAF in suicide attempters=0.30, MAF in non-attempters=0.28, Odds Ratio=1.07, P-value=0.56). Sex-Specific Analyses To investigate if there was a sex-specific association between FKBP5 and attempted suicide, we performed female-specific and male-specific single-variant analyses (supplementary S4 and S5 Tables). Three female-specific and four male-specific variants differed between suicide attempters and non-attempters at a nominally significant level (supplementary S6 Table). The previously discussed main finding (rs ) was also the top female-specific finding (Odds Ratio = 6.79, P-value = 8.1 x 10-3, Permuted P-value = 0.28). The top male-specific variant, rs , showed an increase in significance compared to the main findings (Odds Ratio=0.52, P-value=0.026, Permuted P-value=0.68). No sexspecific findings were statistically significant following correction for multiple testing. Gene-Level Analysis Due to the large amount of sequencing data produced from this study we were able to perform gene-level analyses to investigate whether FKBP5 variants collectively influence suicidal behavior. To do this, we created four groups from the 481 variants studied based on their location and their ability to be damaging: (i) Regulatory Broad, (ii) Regulatory Narrow, (iii) Coding Broad and (iv) Coding Disruptive. The inclusion of two different MAF thresholds allowed us to investigate very rare and less rare separately. We performed two distinct gene-level tests: SKAT and gene-burden. Both tests examine all FKBP5 variants for each individual based on applied thresholds. The gene-burden test employs a logistic regression model to test the association between an indicator variable for whether individuals possess one or 114

132 more rare, functional alleles of a specific category or not and suicide attempt. In comparison, SKAT employs a multivariate model that includes each variant as a covariate to look for significantly different distributions of the variants between suicide attempters and non-attempters, so it has the advantage of not being affected by the direction of effect on risk. No gene-level tests showed evidence of association, regardless of minor allele threshold (Tables 6.2 and 6.3). The top result was from the coding broad analysis using the gene-burden approach (using either MAF threshold; P- value=0.15). The top sex-specific P-value was identified using SKAT within the female Coding Broad subgroup (using either MAF threshold, P-value=0.057). Haplotype Analysis Using Haploview we identified three haplotype blocks in the FKBP5 region encompassing common variants (MAF > 0.05; Fig 6.2 and Table 6.4). Block 1 covered an 11 kb region encompassing four SNPs including the extensively studied rs Block 2 covered a 108 kb region and nine SNPs and Block 3 included thirteen SNPs over a 14 kb region. The top haplotype result was found within Block 3 (P-value = 0.097), but no significant associations between haplotypes and attempted suicide were identified. DISCUSSION Genetic variation can alter FKBP5 function, thereby affecting the cochaperone complex essential to regulating translocation of the GR to the nucleus (205). This may contribute to the over-activation of the HPA axis seen in subjects with suicidal behavior ( ). The goal of this study was to investigate whether variants within the FKBP5 region, separately or together, influence suicidal behavior in individuals with bipolar disorder. 115

133 We employed the novel approach of sequencing both coding and regulatory regions using state-of-the-art next-generation sequencing technology. This allowed us to capture more functional areas of FKBP5 and to focus on rare variation, thereby interrogating the gene more fully than our previous investigations could (32, 142). Interestingly, the most significantly associated variant in this study was located near an area with experimentally demonstrated regulatory function (Fig 6.1) (157). Our previous GWAS (32, 142) tested the hypothesis that common variants contributed to the risk for suicidal behavior, but no significant evidence to this effect was found in relation to FKBP5. This study, in contrast to our GWAS and most other FKBP5 genetic studies, concentrated on both common and rare variation located in coding and regulatory regions of FKBP5 and their contribution to the risk for attempted suicide. However, we were unable to find replicable associations with FKBP5 variants individually or collectively. As secondary analyses, we considered the role of sex-specific genetic variation in attempted suicide. There are several lines of evidence that suggest that suicidal behavior differs between females and males. Specifically, nearly twice as many males complete suicide compared to females (26) while more females attempt suicide (215, 216). Furthermore, we have previously identified a female-specific common variant associated with suicidal behavior suggesting that sex-specific variants might influence the risk of this phenotype (32). No statistically significant sex-specific results were found in this study. The femalespecific SKAT coding broad test showed the top gene burden result (P = 0.057), but larger sample sizes are required to determine whether this trend is a true association. 116

134 Several candidate variants studies have previously implicated FKBP5 variation in increasing risk for suicidal behavior. In a family study focused on bipolar disorder, four variants displayed increased significance when attempted suicide was included in the analysis as a covariate (54). Another bipolar disorder study genotyped eight common FKBP5 variants and found that a haplotype block containing seven of the variants was associated with attempted suicide (57). A depression study focusing on two FKBP5 variants found the TT genotype of rs and the GG genotype of rs were significantly associated with suicidal events, which included attempts, plans to attempt, and ideation, in adolescents on antidepressants; this finding remained even after controlling for treatment (55). A similar study investigating selective serotonin reuptake inhibitor (SSRI) treatment genotyped rs and found that depressed individuals with the minor T allele were at increased risk of suicidal ideation following SSRI treatment (217). Supporting evidence for these studies was provided by Menke et al. who investigated treatment-emergent suicidal ideation (210). A recent study found three FKBP5 variants were associated with suicide attempt, but not with suicide completion. They further identified a haplotype block with increased risk of suicidal behavior comparing depressed suicide attempters and depressed controls. However, these findings were not corrected for multiple testing (218). A Japanese study identified a statistically significant haplotype block with two FKBP5 variants (rs and rs ) in suicide completers, but did not find any single-variant associations (56). A second, larger cohort of suicide completers was also genotyped for rs and rs , and an association was found with rs (58). It must be noted that the main limitations of all these studies was the small sample size and limited genotyping data. This was overcome in genome-wide studies which investigated suicidal 117

135 behavior in larger samples and with broader coverage of the gene (31-35, 117, 118, ), but no significant associations were reported in FKBP5 across these studies. One variant of particular interest to the field, rs (54-58, 206, 207, 218) has repeatedly been associated with suicidal behavior, but did not show association in our study (P-value=0.52). This may be due to our case-only approach, which differs from the trio or case-control designs used by others. We utilized this approach to ensure that our results were not influenced by an underlying psychiatric disorder, but this did increase the level of suicidal ideation in the non-attempter cohort. Ideation has been previously associated with FKBP5 variation (55, 217), and thus we may have reduced our power to detect an influence of FKBP5. Another possible explanation for the conflicting results may be that we focused on testing areas around and throughout the gene we surmised would be more likely to disrupt transcription or translation, and therefore we were unable to cover all the previously investigated variants (Fig 6.1). We did not cover intronic variants outside overlapping DNase hypersensitivity sites and transcription factor binding sites that have been associated with this phenotype. In particular, rs , which can alter expression due to its proximity to a GR response element (GRE) (219), was not covered. This study has other limitations. With an estimated 80% power (calculated using Quanto v1.2.4; we were able to detect a single variant with an effect size as low as 2.35 assuming a study-wide significance of 1.04 x 10-4 and MAF of For the replication analysis, we were able to detect a single variant with an effect size of 1.88 assuming a MAF of With a MAF of (seen in the replication results for rs ), an effect 118

136 size of 3.70 can be detected. For the gene-level analyses, we had an estimated 80% power to detect an effect size as low as 1.95 assuming a study-wide significance of 6.3 x 10-3 and MAF of In addition, it should be noted that other biological and environmental factors that impact suicidal behavior, such as epigenetics and early childhood trauma, were not included in this study, but should also be considered alongside these results. Considerable evidence has emerged for an epigenetic component to FKBP5 s role in suicidal behavior, with particular focus on demethylation within and around GREs (220). Furthermore, interacting factors, such as high levels of early childhood trauma, have been shown to be positively associated with increased risk of suicide attempt (206, 207). We did not have access to early childhood history for all subjects, so we could not investigate this gene-by-environment effect. An additional limitation of the replication dataset is the skewed proportion of male suicide attempters (15.85%; supplementary S2 Table) caused by the restricted number of remaining NIMH Genetics Initiative samples. We have completed a large, targeted sequencing project focused on pursuing both coding regions and regulatory regions within FKBP5, a gene previously associated with suicidal behavior. While this study generated several nominally significant results and one statistically significant result, none of them survive correction for multiple testing and/or replication. Thus, it does not provide consistent support for a role of FKBP5 variants increasing the risk of suicidal behavior. The failure to replicate the top finding reinforces the need for larger sample sizes in genetic studies. Greater datasets will more definitively test the relationship between FKBP5 and suicidal behavior. 119

137 FUNDING This project was supported by the NIMH Grant R01 MH awarded to Virginia Willour, the Postdoctoral Fellowship Grant PDF awarded to Marie Breen from the American Foundation for Suicide Prevention, the University of Iowa Medical Scientist Training Program (MSTP) training grant 5 T32 GM awarded to Eric Monson and support from the University of Iowa Interdisciplinary Graduate Program in Genetics (Sophia Gaynor, Eric Monson, Dr. Willour and Dr. Potash). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. COMPETING INTERESTS The authors have declared that no competing interests exist. 120

138 FIGURES Figure 6.1: Schematic of the Top Variant in the FKBP5 Gene alongside Previous Findings. The top single variant result is displayed below the gene; variants or haplotypes with known associations to suicidal behavior are displayed above the gene. Gray variants were not covered by this study. Numbers represent references: 1. (57) 2. (207) 3. (217) 4. (56) 5. (206) 6. (55) 7. (54) 8. (58) 9. (218). All FKBP5 transcripts have been collated to show exons (black vertical lines), introns (black horizontal lines) and untranslated regions (black boxes). A transcription factor binding site (TFBS; green box) and DNase hypersensitivity site (purple box) upstream of the top variant are labeled and are not to scale. GRE denotes glucocorticoid receptor response elements (blue boxes) and are not to scale. 121

139 Figure 6.2: Haplotype Block Structures in the FKBP5 Region. A collated version of all FKBP5 transcripts is represented at the top of the figure. Vertical black lines represent exons, green boxes represent untranslated regions and purple boxes represent intronic regions. The numbered squares display the D score, unnumbered squares have a D score of 1.0. Three haplotype blocks were generated and are enclosed by black lines. Lines connect the variants to their approximate location within the FKBP5 locus. 122

140 TABLES Table 6.1: Single-Variant Results with a P-Value < 0.05 Variant a Chromosomal Position b Locationc P-value d Permuted P-value Odds Ratio e Odds Ratio 95% Confidence Level Lower Upper Minor Allele Frequency Suicide Attempters Non- Attempters rs chr6: Intronic/3 UTR 7.5 x rs chr6: Intronic/3 UTR 1.02 x rs chr6: Intronic rs chr6: Intronic rs chr6: 'UTR a Annotated from dbsnp 142. b Using UCSC Genome Browser Human Feb (GRCh37/hg19) Assembly. c Including all transcripts as determined by UCSC Genome browser databases (UCSC Genes, GenCODE Genes, Ensembl Genes, RefSeq Genes and ENCODE databases). d Corrected for sex and the first three principal components. e Odds ratios shown are for the minor allele. 123

141 Table 6.2: Gene-Level Results Using Two Minor Allele Thresholds SKAT P-value a Burden P-value a Odds Ratio MAF 1% MAF 5% MAF 1% MAF 5% MAF 1% MAF 5% Regulatory Broad b Regulatory Narrow c Coding Broad d Coding Disruptive e N/A N/A N/A denotes values that could not be computed. a Corrected for sex and the first three principal components. b Predicted to be damaging to regulatory regions with a score of 6 by RegulomeDB. c Predicted to be damaging to regulatory regions with a score of 2 by RegulomeDB. d Predicted to be damaging to coding regions by at least one of six software programs (SIFT, Polyphen2 (HDIV and HVAR), LRT, MutationTaster and VEST) or considered disruptive. e Considered to be an essential splicing variant, frameshift insertion/deletion or stop-gain variant using ANNOVAR. 124

142 Table 6.3: Sex-Specific Gene-Level Results Using Two Minor Allele Thresholds Female Male SKAT P-value a Burden P-value a Odds Ratio SKAT P-value a Burden P-value a Odds Ratio MAF 1% MAF 5% MAF 1% MAF 5% MAF 1% MAF 5% MAF 1% MAF 5% MAF 1% MAF 5% MAF 1% MAF 5% Regulatory Broad b Regulatory Narrow c Coding Broad d Coding Disruptive e N/A N/A N/A N/A N/A N/A N/A N/A N/A denotes values that could not be computed. a Corrected for the first three principal components. b Predicted to be damaging to regulatory regions with a score of 6 by RegulomeDB. c Predicted to be damaging to regulatory regions with a score of 2 by RegulomeDB. d Predicted to be damaging to coding regions by at least one of six software programs (SIFT, Polyphen2 (HDIV and HVAR), LRT, MutationTaster and VEST) or considered disruptive. e Considered to be an essential splicing variant, frameshift insertion/deletion or stop-gain variant using ANNOVAR. 125

143 Table 6.4: Haplotype Results Generated using Haploview Haplotypes Haplotype Frequency Suicide Attempter Frequencies Non-attempter Frequencies Chi Square Uncorrected P- value CCCA Block 1 Block 2 Block 3 CCCC CACC AAAC CCCCCCCCC CACCCCACA CCCACCCAC ACACAAACA CCCCCCCAC CCCCCCACA CCACCCCCA CCCCCCCCA ACCCCACCACCCC CCACCCAACAACC CACCACCCCCCCC CCCCCCCACAAAA CCAACCAACAACC CCCCCCCCCCCCC CCACCCCACAAAA CCACCACCACCCC All default values were used apart from the minimum minor allele frequency which was changed to

144 SUPPLEMENTAL INFORMATION Supplementary figures and tables referenced in this article (53) can be accessed on the publisher s website, 127

145 SUMMARY CHAPTER 7, CONCLUSION This dissertation has offered the results of several explorations of rare functional variation within SB and BD. These investigations collectively represent the largest existing efforts to assess the contribution of rare functional variation to the risk of these severe psychiatric phenotypes. It was certainly our hope that we would be successful in the identification of strong evidence of an association of rare functional variation with SB and BD through these efforts. However, no individual variants or genes were identified with association signals that could survive correction for multiple testing within our hypothesis-free or hypothesisdriven analyses. Such results support conclusions made in initial estimates of power for exome sequencing analyses within complex traits that, similarly to GWAS, larger sample sizes will likely be required to reliably and significantly identify associations at specific loci (221). We were, however, still able to answer several important questions regarding the contribution of rare functional variation to SB and BD. First, our detailed data allowed considerations of the relationship of our top findings across multiple psychiatric phenotypes. By fully leveraging the familial and case/control data within our large exome-sequencing dataset, significant evidence was identified of an enrichment of genes that were nominally associated with BD in our data with de novo autism risk genes and risk loci in schizophrenia. These findings continue to improve our understanding of biological pathways that, when disrupted, may contribute to the risk of multiple psychiatric diseases. Such shared sites, with further investigation, could inform on pathogenesis and potential treatment targets for these diseases and offer possible insights to 128

146 developmental or functional pathways for the brain that have not been fully appreciated. Conversely, our suggestive results for the SB and BD project revealed unique risk loci for both phenotypes with very little overlap in the signals (Figure 7.1). This observation suggests that, in the case of these identified suggestive findings, the sites are largely specific to each phenotype despite the shared samples and consistent psychiatric background of the investigations. The identification of such overlapping and unique loci for our psychiatric phenotypes offer support for the continued construction and thoughtful analyses of large, well-annotated samples for sequencing that can serve to advance our understanding of multiple phenotypes simultaneously. Second, our results effectively rule out mendelian/high-penetrance contributions to the risk of SB within rare functional variants detected within the human exome. This serves to add to considerable empirical evidence in support of the long held view that there is unlikely to be a gene for complex psychiatric phenotypes (222). Instead, our results predominantly point toward more complicated disruptions involving many risk loci that fall within biologically related pathways. Such findings offer valuable insights that may aid in the design of future investigations within SB. Third, we were able to specifically and robustly test existing hypotheses regarding several SB candidate genes and pathways that have garnered considerable attention in the field. Many of these candidates have had inconsistent association evidence that may be related, in part, to differing methods and variants assessed between studies. The coding and regulatory sequencing data produced for the assessed candidate genes represent the most comprehensive assessments, to date, for each of these genes. Our results, 129

147 therefore, allow a critical evaluation of current hypotheses related to these genes and pathways for SB. In addition, our findings underscore the potential importance of exploring non-coding sequence and broadening the scope of future candidate gene and pathway analyses. In addition to these important answers, the data produced from these efforts also offer considerable value to the field in a number of ways. First, the presentation of novel, though nominal, associations of rare functional variation within SB and BD provide potential targets for candidate gene and pathway analyses in the future. These novel targets could considerably broaden the scope of the traditional candidate gene assessments for SB and BD. Hypothesis-driven analyses making use of these identified genes and pathways would also allow reduced multiple testing burden and a potentially increased likelihood of identifying significant associations in future exome- and genome-wide studies. Second, the availability of the high-quality data produced from our initial analyses will allow direct integration of our results into future meta- or mega- analyses as sequencing consortia form. Such data integration will allow significantly more powerful analyses and better potential to detect reliable risk loci, as has been observed within GWAS for several psychiatric phenotypes (13, 66, 70). Finally, the high level of novelty within the detected rare functional sites of variation will allow profitable additions to key existing variant databases for the annotation of sequencing efforts and the creation of rare-variant genotyping arrays. These databases and tools provide the means for future economical exploration of rare functional variation without requiring additional costly sequencing (181, 214, 223). 130

148 LIMITATIONS There are also a number of questions that we could not answer due to study design and technological limitations. These limitations are specifically discussed within each presented study, but a broader critical assessment will be applied here. This assessment will examine common limitations across the studies relating to the selected sequencing methodologies, the sample, the types of variation assessed, and the statistical methods. In addition, our rationale for making study design decisions and alternatives to these selections will be described. The first limitation, imposed within our study design, was the choice to utilize whole exome sequencing (WES) and targeted sequencing methods rather than whole genome sequencing (WGS) to generate our datasets. Our rationale for selecting targeted approaches focused on three key points: interpretability, data quality, and economy. WES focuses on deeply sequencing targets within well-described coding regions of the genome, allowing high-confidence genotype calls in the targeted regions (224). Therefore, WES provides a dataset that is far more straightforward to annotate, process, store, and interpret with significantly less cost and time investment as compared with WGS. Despite these strengths of WES, the fact that it only provides ~1% of the complete genomic sequence (224) prevents us from being able to comment on the possibility of largemagnitude effect loci associated with either SB or BD within the vast non-coding regions of the genome. It was our goal to partially address this information gap within candidate genes for SB by targeted sequencing of additional sites with evidence of regulatory function within/near these genes (40, 157). These targeted regions, however, still suffer from the inherent weakness that they represent targets rather than comprehensive coverage and leave extensive 131

149 regions within the candidate gene loci unassessed, particularly within larger candidate genes. In addition, the use of exclusively sequencing data further limits our analyses. Specifically, we did not collect epigenetic or expression data as part of this study. These data could offer important evidence of regulatory perturbations that may be associated with genetic variation for both SB and BD (225, 226). Such findings could help explain studies that have identified differentially expressed genes that have not been well-explained by genetic variation alone, such as in the case of SAT1 in SB (42, 44-49). The second limitation imposed by our study design relates to the generalizability of the sample assessed, especially in the case of SB. The sample represents a large pool of European-American individuals diagnosed with BD and/or closely related psychiatric phenotypes coupled with extensive phenotypic data, including SB history (154, 155, 180). Our rationale for selecting such a highly homogenous population relates to the potential control of confounding factors this population allows (32). Such a focused group of individuals poses two primary limitations to the generalizability of the interpretation of the presented studies, however. First this population limits the ability of the studies to inform on risk factors to the European American ethnicity, making the accuracy of any predictions for the much more diverse general population less reliable. Second, the investigation of rare variation within the SB studies strictly in individuals with BD limits interpretation of the results to this population, particularly. Third, despite careful consideration regarding which subjects to include in our primary analyses, the selected definitions of the phenotypes for SB and BD have the potential of introducing genetic heterogeneity. Most individuals assessed were diagnosed with bipolar disorder type I (BPI), with a small subset 132

150 diagnosed as schizoaffective bipolar type (SA/BP) or bipolar not otherwise specified (BP-NOS). There is past evidence that SA/BP subjects are very similar to BPI subjects (180, 227), and BP-NOS represents a phenotype within the spectrum of BD. Our rationale for including these SA/BP and BP-NOS subjects was due to their inclusion in previous BD studies that have utilized the same study sample and applied rigorous assessments of genetic homogeneity among these subjects (32, 228). In addition, all of the SB studies presented within this dissertation examine individuals with a history of suicide attempt rather than subjects who died from suicide. Our rationale for selecting these subjects for an SB investigation is focused on two primary points: reliability of the phenotype as a predictor of death by suicide, and availability of individuals to study. First, suicide attempt has been identified as the strongest single predictor of death by suicide (26, 229). Second, there are many more suicide attempts than deaths from suicide (26). We further attempted to refine the pool of those with past suicide attempts in our sample by only including individuals with definite to serious intent to die, defined during the subject interviews, in the study analyses. However, because our study utilizes suicide attempt as the sole representation of SB, it remains to be seen if our results are consistent in individuals who die from suicide. There is also significant evidence that past environmental exposures, such as a history of child abuse or other severe stress events, can play an important role in risk for development of both phenotypes (121, 171, ). However, we chose to not include any gene-by-environment analyses within our data. This decision was made due to inconsistent or incomplete histories for our subjects for environmental exposures, greatly limiting the number of subjects that could have been included in such assessments. 133

151 The fourth limitation encountered within our study design is centered on the types of variation used within our analyses. Our analyses focused primarily on rare variations that were predicted to be deleterious. These criteria notably left out potentially important sources of variation including synonymous variation (234), common variation (regardless of functional predictions) (235), and noncoding sites that lacked annotation from all gene and pathway analyses presented. In addition, we largely excluded detected insertion/deletion (indels) sites from analyses with the exception of the SB targeted sequencing candidate gene studies due to the use of a more reliable calling method for indel sites in these particular studies. We also did not screen any of the assessed data for potential copy number variants (CNVs). All of these sources of variation were excluded due to concerns regarding the technical limitations of accurately calling the variants in targeted next generation sequencing (indels and CNVs) (236, 237), the difficulty of interpreting an association result (synonymous coding or unannotated non-coding variation), or limitations of the analytical methods to appropriately handle common variation. It is also important to note that even within sites that were assessed, the annotations that were used to identify potentially damaging variants are based on algorithms and data that continue to evolve. For example, existing annotation for potentially functional non-coding variation is typically the product of merging data from many different tissue types, as is seen within CADD (238) and the ENCODE (157) data tracks of the UCSC genome browser (181), leaving the reliability of their predictive value in a single tissue, such as the brain, in question. As tissuespecific datasets continue to grow in size, and methods for calling, annotating, and analyzing sequencing data continue to evolve, it is possible that revaluation 134

152 of existing datasets may identify new results, as has been observed within clinical sequencing efforts (239). Finally, we chose within our analyses to apply a complementary but succinct collection of statistical models to evaluate association with SB and BD. The selected models and the targeted tests of variants, genes, and pathways represent only a small fraction of the possible options for assessing exome and targeted sequencing data. As the databases for genes and pathways continue to grow and new methods of analysis are created, it is likely that additional profitable analyses will be possible within our existing data (239). FUTURE DIRECTIONS Only a few significant risk loci have been identified within existing large SB (240) and BD (14) GWAS. The existing evidence from this effort, among others, establishes a need for additional samples to confidently detect novel risk loci and confirm those already identified. This need is currently being addressed by efforts to combine the generated resources from sequencing projects into collaborative consortia (241). Such combined datasets will allow more powerful direct and replication analyses within the rare functional sites that were the focus of this dissertation. Larger samples alone, however, are not the only solution for identifying key risk loci. Indeed, the analytical methods available to assess exome- and genome-wide data continue to be developed and refined. As mentioned within the limitations of this dissertation, such advancements may allow additional fruitful examinations within existing datasets (239). In addition, basic science efforts continue to expand our understanding of the potential involvement of genes, pharmacological agents, and environmental influences in diverse biological processes that are relevant to the genesis and progression of complex 135

153 human diseases, such as cancer (242). As these processes are identified and described within the framework of psychiatric disease, the possibility of better hypothesis driven tests within large datasets may also identify additional risk loci. As reliable risk loci are discovered, it will also be essential to functionally assess the biological and clinical relevance of these loci (243). Initial steps that would allow such assessments include the utilization of the rapidly advancing gene editing techniques such as CRISPR/Cas (244) within human cell lines and high-throughput animal model systems such as c. elegans or drosophila. Such systems can be rapidly and economically employed to introduce suspected risk loci in the case of human cell lines, or perturb conserved genes and/or pathways within the model organisms to observe any resulting measurable effects (245). In addition, such systems allow the controlled exposure of pharmacological agents to assess the potential efficacy of treatments in reversing observed phenotypes in the presence of a given genetic background, as has been applied within cancer research efforts (246). Preliminary results from such rapid screening systems could provide support for expanded investigations within more complex animal models, clinical trials, and eventual assessment within humans. CONCLUSION These efforts, though substantial and currently the largest of their kind within SB and BD, do not identify significant evidence of large effect-size rare functional variation in individual variants or genes. Significant evidence was identified, however, in support of an overlap of autism and BD risk loci, adding to a growing list of loci that may be of importance to the development of multiple psychiatric phenotypes. We assert that the initial efforts presented within this dissertation represent key first steps in the investigation of rare functional variation within SB and BD. Several trends and suggestive findings are 136

154 presented that hint at moderate-effect size variation that is below the detection threshold of this sample, which may offer a significant contribution to our understanding of SB and BD with expanded sequencing efforts. In addition, the valuable and highly novel datasets produced from these efforts, coupled with answers to several basic questions regarding the role of rare functional variation within SB and BD, demonstrate the need for initial studies such as those presented. Overall, our observations match those of other similarly-powered studies of complex psychiatric disease (79, 247), which indicate the need for larger and carefully selected cohorts to elucidate the genetic underpinnings of SB and BD. Such work, coupled with the use of functional assays to test the biological relevance of identified associations, may provide critical direction for better targeted and more efficacious treatments and the potential to provide better risk and prognostic assessment for these devastating phenotypes. 137

155 FIGURES Figure 7.1: Overlap of Top Gene Results from SB and BD RareBLISS Data. This plot demonstrates the level of overlap between top gene results for the SB and BD assessments of the RareBLISS data at various nominal significance thresholds. The columns represent the counts of total unique gene results (striped) and total overlapping gene results (solid black). 138

156 APPENDIX A: WORKFLOW FOR THE RAREBLISS EXOME-SEQUENCING PROJECT Figure A.1: Data Preparation Workflow and Contributions. 139

157 Figure A.2: Bipolar Disorder Exome Project Statistical Analyses and Contributions. 140

158 Figure A.3: Suicide Exome Project Statistical Analyses and Contributions. 141

159 APPENDIX B: WORKFLOW FOR THE SB TARGETED CANDIDATE GENE SEQUENCING PROJECT Figure B.1: Data Preparation Workflow and Contributions. 142

160 Figure B.2: SB Candidate Gene Project Statistical Analyses and Contributions. 143

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency Supplementary Figure 1 Missense damaging predictions as a function of allele frequency Percentage of missense variants classified as damaging by eight different classifiers and a classifier consisting

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

Lecture 20. Disease Genetics

Lecture 20. Disease Genetics Lecture 20. Disease Genetics Michael Schatz April 12 2018 JHU 600.749: Applied Comparative Genomics Part 1: Pre-genome Era Sickle Cell Anaemia Sickle-cell anaemia (SCA) is an abnormality in the oxygen-carrying

More information

Rare Variant Burden Tests. Biostatistics 666

Rare Variant Burden Tests. Biostatistics 666 Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

What can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University

What can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University What can genetic studies tell us about ADHD? Dr Joanna Martin, Cardiff University Outline of talk What do we know about causes of ADHD? Traditional family studies Modern molecular genetic studies How can

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

Illuminating the genetics of complex human diseases

Illuminating the genetics of complex human diseases Illuminating the genetics of complex human diseases Michael Schatz Sept 27, 2012 Beyond the Genome @mike_schatz / #BTG2012 Outline 1. De novo mutations in human diseases 1. Autism Spectrum Disorder 2.

More information

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014 1 Introduction Asthma is a chronic respiratory disease affecting

More information

ISPG Residency Education Taskforce

ISPG Residency Education Taskforce ISPG Residency Education Taskforce What does genetics have to do with psychiatry? - psychiatric illnesses run in families - the major psychiatric disorders have a high heritability - specific genes may

More information

Request for Applications Post-Traumatic Stress Disorder GWAS

Request for Applications Post-Traumatic Stress Disorder GWAS Request for Applications Post-Traumatic Stress Disorder GWAS PROGRAM OVERVIEW Cohen Veterans Bioscience & The Stanley Center for Psychiatric Research at the Broad Institute Collaboration are supporting

More information

Identifying Mutations Responsible for Rare Disorders Using New Technologies

Identifying Mutations Responsible for Rare Disorders Using New Technologies Identifying Mutations Responsible for Rare Disorders Using New Technologies Jacek Majewski, Department of Human Genetics, McGill University, Montreal, QC Canada Mendelian Diseases Clear mode of inheritance

More information

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits? corebio II - genetics: WED 25 April 2018. 2018 Stephanie Moon, Ph.D. - GWAS After this class students should be able to: 1. Compare and contrast methods used to discover the genetic basis of traits or

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data. Supplementary Figure 1 PCA for ancestry in SNV data. (a) EIGENSTRAT principal-component analysis (PCA) of SNV genotype data on all samples. (b) PCA of only proband SNV genotype data. (c) PCA of SNV genotype

More information

Golden Helix s End-to-End Solution for Clinical Labs

Golden Helix s End-to-End Solution for Clinical Labs Golden Helix s End-to-End Solution for Clinical Labs Steven Hystad - Field Application Scientist Nathan Fortier Senior Software Engineer 20 most promising Biotech Technology Providers Top 10 Analytics

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Psychiatric genetics 2025: the need to focus on childhood, adolescence, and life

Psychiatric genetics 2025: the need to focus on childhood, adolescence, and life Psychiatric genetics 2025: the need to focus on childhood, adolescence, and life Thomas G. Schulze, MD Institute of Psychiatric Phenomics and Genomics (IPPG), Ludwig-Maximilians-University Munich, Germany

More information

Professional Counseling Psychology

Professional Counseling Psychology Professional Counseling Psychology Regulations for Case Conceptualization Preparation Manual Revised Spring 2015 Table of Contents Timeline... 3 Committee Selection and Paperwork... 3 Selection of Client

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature13908 Supplementary Tables Supplementary Table 1: Families in this study (.xlsx) All families included in the study are listed. For each family, we show: the genders of the probands and

More information

American Psychiatric Nurses Association

American Psychiatric Nurses Association Francis J. McMahon International Society of Psychiatric Genetics Johns Hopkins University School of Medicine Dept. of Psychiatry Human Genetics Branch, National Institute of Mental Health* * views expressed

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

A dissertation by. Clare Rachel Watsford

A dissertation by. Clare Rachel Watsford Young People s Expectations, Preferences and Experiences of Seeking Help from a Youth Mental Health Service and the Effects on Clinical Outcome, Service Use and Future Help-Seeking Intentions A dissertation

More information

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) September 14, 2012 Chun Xu M.D, M.Sc, Ph.D. Assistant professor Texas Tech University Health Sciences Center Paul

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

White Paper Guidelines on Vetting Genetic Associations

White Paper Guidelines on Vetting Genetic Associations White Paper 23-03 Guidelines on Vetting Genetic Associations Authors: Andro Hsu Brian Naughton Shirley Wu Created: November 14, 2007 Revised: February 14, 2008 Revised: June 10, 2010 (see end of document

More information

Human Genetics 542 Winter 2018 Syllabus

Human Genetics 542 Winter 2018 Syllabus Human Genetics 542 Winter 2018 Syllabus Monday, Wednesday, and Friday 9 10 a.m. 5915 Buhl Course Director: Tony Antonellis Jan 3 rd Wed Mapping disease genes I: inheritance patterns and linkage analysis

More information

Burning debate: What s the best way to nab real autism genes?

Burning debate: What s the best way to nab real autism genes? OPINION, VIEWPOINT Burning debate: What s the best way to nab real autism genes? BY BRIAN O'ROAK 27 JUNE 2017 Over the past 10 years researchers have made tremendous progress in understanding the genetic

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014 MULTIFACTORIAL DISEASES MG L-10 July 7 th 2014 Genetic Diseases Unifactorial Chromosomal Multifactorial AD Numerical AR Structural X-linked Microdeletions Mitochondrial Spectrum of Alterations in DNA Sequence

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab

SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA. Giuseppe Narzisi, PhD Schatz Lab SCALPEL MICRO-ASSEMBLY APPROACH TO DETECT INDELS WITHIN EXOME-CAPTURE DATA Giuseppe Narzisi, PhD Schatz Lab November 14, 2013 Micro-Assembly Approach to detect INDELs 2 Outline Scalpel micro-assembly pipeline

More information

BroadcastMed Bipolar, Borderline, Both? Diagnostic/Formulation Issues in Mood and Personality Disorders

BroadcastMed Bipolar, Borderline, Both? Diagnostic/Formulation Issues in Mood and Personality Disorders BroadcastMed Bipolar, Borderline, Both? Diagnostic/Formulation Issues in Mood and Personality Disorders BRIAN PALMER: Hi. My name is Brian Palmer. I'm a psychiatrist here at Mayo Clinic. Today, we'd like

More information

Human Genetics 542 Winter 2017 Syllabus

Human Genetics 542 Winter 2017 Syllabus Human Genetics 542 Winter 2017 Syllabus Monday, Wednesday, and Friday 9 10 a.m. 5915 Buhl Course Director: Tony Antonellis Module I: Mapping and characterizing simple genetic diseases Jan 4 th Wed Mapping

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST INCLUDED IN THIS REPORT: REVIEW OF YOUR GENETIC INFORMATION RELEVANT TO ENDOMETRIOSIS PERSONAL EDUCATIONAL INFORMATION RELEVANT TO YOUR GENES INFORMATION FOR OBTAINING YOUR ENTIRE X-SCREEN DATA FILE PERSONALIZED

More information

Title: Pinpointing resilience in Bipolar Disorder

Title: Pinpointing resilience in Bipolar Disorder Title: Pinpointing resilience in Bipolar Disorder 1. AIM OF THE RESEARCH AND BRIEF BACKGROUND Bipolar disorder (BD) is a mood disorder characterised by episodes of depression and mania. It ranks as one

More information

Quantitative genetics: traits controlled by alleles at many loci

Quantitative genetics: traits controlled by alleles at many loci Quantitative genetics: traits controlled by alleles at many loci Human phenotypic adaptations and diseases commonly involve the effects of many genes, each will small effect Quantitative genetics allows

More information

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients AD (Leave blank) Award Number: W81XWH-12-1-0444 TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients PRINCIPAL INVESTIGATOR: Mark A. Watson, MD PhD CONTRACTING ORGANIZATION:

More information

Aggregation of psychopathology in a clinical sample of children and their parents

Aggregation of psychopathology in a clinical sample of children and their parents Aggregation of psychopathology in a clinical sample of children and their parents PA R E N T S O F C H I LD R E N W I T H PSYC H O PAT H O LO G Y : PSYC H I AT R I C P R O B LEMS A N D T H E A S SO C I

More information

Research Article Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins

Research Article Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins Genetics Research International, Article ID 154204, 8 pages http://dx.doi.org/10.1155/2014/154204 Research Article Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins Qihua

More information

PROGRESS: Beginning to Understand the Genetic Predisposition to PSC

PROGRESS: Beginning to Understand the Genetic Predisposition to PSC PROGRESS: Beginning to Understand the Genetic Predisposition to PSC Konstantinos N. Lazaridis, MD Associate Professor of Medicine Division of Gastroenterology and Hepatology Associate Director Center for

More information

Genetics of Behavior (Learning Objectives)

Genetics of Behavior (Learning Objectives) Genetics of Behavior (Learning Objectives) Recognize that behavior is multi-factorial with genetic components Understand how multi-factorial traits are studied. Explain the terms: incidence, prevalence,

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant

National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant National Surgical Adjuvant Breast and Bowel Project (NSABP) Foundation Annual Progress Report: 2009 Formula Grant Reporting Period July 1, 2011 June 30, 2012 Formula Grant Overview The National Surgical

More information

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Statistical Genetics : Gene Mappin g through Linkag e and Associatio n Benjamin M Neale Manuel AR Ferreira Sarah E Medlan d Danielle Posthuma About the editors List of contributors Preface Acknowledgements

More information

Genetics of Behavior (Learning Objectives)

Genetics of Behavior (Learning Objectives) Genetics of Behavior (Learning Objectives) Recognize that behavior is multi-factorial with genetic components Understand how multi-factorial traits are studied. Explain the terms: prevalence, incidence,

More information

Goal: To identify the extent to which different aspects of psychopathology might be in some way inherited

Goal: To identify the extent to which different aspects of psychopathology might be in some way inherited Key Dates TH Mar 30 Unit 19; Term Paper Step 2 TU Apr 4 Begin Biological Perspectives, Unit IIIA and 20; Step 2 Assignment TH Apr 6 Unit 21 TU Apr 11 Unit 22; Biological Perspective Assignment TH Apr 13

More information

An expanded view of complex traits: from polygenic to omnigenic

An expanded view of complex traits: from polygenic to omnigenic BIRS 2017 An expanded view of complex traits: from polygenic to omnigenic How does human genetic variation drive variation in complex traits/disease risk? Yang I Li Stanford University Evan Boyle Jonathan

More information

DEMOGRAPHIC, PSYCHOSOCIAL, AND EDUCATIONAL FACTORS RELATED TO FRUIT AND VEGETABLE CONSUMPTION IN ADULTS. Gloria J. Stables

DEMOGRAPHIC, PSYCHOSOCIAL, AND EDUCATIONAL FACTORS RELATED TO FRUIT AND VEGETABLE CONSUMPTION IN ADULTS. Gloria J. Stables DEMOGRAPHIC, PSYCHOSOCIAL, AND EDUCATIONAL FACTORS RELATED TO FRUIT AND VEGETABLE CONSUMPTION IN ADULTS By Gloria J. Stables Dissertation submitted to the Faculty of the Virginia Polytechnic Institute

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

Chapter 18 Genetics of Behavior. Chapter 18 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning

Chapter 18 Genetics of Behavior. Chapter 18 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning Chapter 18 Genetics of Behavior Behavior Most human behaviors are polygenic and have significant environmental influences Methods used to study inheritance include Classical methods of linkage and pedigree

More information

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics Precision Genomics for Immuno-Oncology Personalis, Inc. ACE ImmunoID When one biomarker doesn t tell the whole

More information

Developing and evaluating polygenic risk prediction models for stratified disease prevention

Developing and evaluating polygenic risk prediction models for stratified disease prevention Developing and evaluating polygenic risk prediction models for stratified disease prevention Nilanjan Chatterjee 1 3, Jianxin Shi 3 and Montserrat García-Closas 3 Abstract Knowledge of genetics and its

More information

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Dr Elaine Kenny Neuropsychiatric Genetics Research Group Institute of Molecular Medicine Trinity College Dublin

More information

Personalis ACE Clinical Exome The First Test to Combine an Enhanced Clinical Exome with Genome- Scale Structural Variant Detection

Personalis ACE Clinical Exome The First Test to Combine an Enhanced Clinical Exome with Genome- Scale Structural Variant Detection Personalis ACE Clinical Exome The First Test to Combine an Enhanced Clinical Exome with Genome- Scale Structural Variant Detection Personalis, Inc. 1350 Willow Road, Suite 202, Menlo Park, California 94025

More information

Non-Mendelian inheritance

Non-Mendelian inheritance Non-Mendelian inheritance Focus on Human Disorders Peter K. Rogan, Ph.D. Laboratory of Human Molecular Genetics Children s Mercy Hospital Schools of Medicine & Computer Science and Engineering University

More information

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, ESM Methods Hyperinsulinemic-euglycemic clamp procedure During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin, Clayton, NC) was followed by a constant rate (60 mu m

More information

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype Fragile X Syndrome Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype A loss of function of the FMR-1 gene results in severe learning problems, intellectual disability

More information

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Ordering Information Acceptable specimen types: Fresh blood sample (3-6 ml EDTA; no time limitations associated with receipt)

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Summary & general discussion

Summary & general discussion Summary & general discussion 160 chapter 8 The aim of this thesis was to identify genetic and environmental risk factors for behavioral problems, in particular Attention Problems (AP) and Attention Deficit

More information

Finding the Missing Heritability: Gene Mapping Strategies for Complex Pedigrees. Kaanan Pradeep Shah

Finding the Missing Heritability: Gene Mapping Strategies for Complex Pedigrees. Kaanan Pradeep Shah Finding the Missing Heritability: Gene Mapping Strategies for Complex Pedigrees by Kaanan Pradeep Shah A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Misheck Ndebele. Johannesburg

Misheck Ndebele. Johannesburg APPLICATION OF THE INFORMATION, MOTIVATION AND BEHAVIOURAL SKILLS (IMB) MODEL FOR TARGETING HIV-RISK BEHAVIOUR AMONG ADOLESCENT LEARNERS IN SOUTH AFRICA Misheck Ndebele A thesis submitted to the Faculty

More information

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi

CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE. Dr. Bahar Naghavi 2 CURRENT GENETIC TESTING TOOLS IN NEONATAL MEDICINE Dr. Bahar Naghavi Assistant professor of Basic Science Department, Shahid Beheshti University of Medical Sciences, Tehran,Iran 3 Introduction Over 4000

More information

Kirk Wilson. Acupuncture as an Adjunct Therapy in the Treatment of Depression. Doctor of Philosophy

Kirk Wilson. Acupuncture as an Adjunct Therapy in the Treatment of Depression. Doctor of Philosophy Kirk Wilson Acupuncture as an Adjunct Therapy in the Treatment of Depression Doctor of Philosophy 2014 i Certificate of Original Authorship I certify that the work in this thesis has not previously been

More information

September 20, Submitted electronically to: Cc: To Whom It May Concern:

September 20, Submitted electronically to: Cc: To Whom It May Concern: History Study (NOT-HL-12-147), p. 1 September 20, 2012 Re: Request for Information (RFI): Building a National Resource to Study Myelodysplastic Syndromes (MDS) The MDS Cohort Natural History Study (NOT-HL-12-147).

More information

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke University of Groningen Metabolic risk in people with psychotic disorders Bruins, Jojanneke IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS

AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS UNIVERSITY OF SOUTHERN QUEENSLAND AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS A Dissertation submitted by Gwenda Latham, MBA For the award of Doctor

More information

THE IMPACTS OF HIV RELATED STIGMA ON CHILDREN INFECTED AND AFFECTED WITH HIV AMONG THE CARE AND SHARE PROJECT OF THE FREE

THE IMPACTS OF HIV RELATED STIGMA ON CHILDREN INFECTED AND AFFECTED WITH HIV AMONG THE CARE AND SHARE PROJECT OF THE FREE THE IMPACTS OF HIV RELATED STIGMA ON CHILDREN INFECTED AND AFFECTED WITH HIV AMONG THE CARE AND SHARE PROJECT OF THE FREE METHODIST CHURCH, ANDHERI EAST, IN MUMBAI BY STELLA G. BOKARE A Dissertation Submitted

More information

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK Heritability and genetic correlations explained by common SNPs for MetS traits Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK The Genomewide Association Study. Manolio TA. N Engl J Med 2010;363:166-176.

More information

IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence?

IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence? 1 IS IT GENETIC? How do genes, environment and chance interact to specify a complex trait such as intelligence? Single-gene (monogenic) traits Phenotypic variation is typically discrete (often comparing

More information

REPORT TO CONGRESS Multi-Disciplinary Brain Research and Data Sharing Efforts September 2013 The estimated cost of report or study for the Department of Defense is approximately $2,540 for the 2013 Fiscal

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

Predicting and facilitating upward family communication as a mammography promotion strategy

Predicting and facilitating upward family communication as a mammography promotion strategy University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2010 Predicting and facilitating upward family communication as

More information

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013 Introduction of Genome wide Complex Trait Analysis (GCTA) resenter: ue Ming Chen Location: Stat Gen Workshop Date: 6/7/013 Outline Brief review of quantitative genetics Overview of GCTA Ideas Main functions

More information

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Yicun Ni (ID#: 9064804041), Jin Ruan (ID#: 9070059457), Ying Zhang (ID#: 9070063723) Abstract As the

More information

Help-seeking behaviour for emotional or behavioural problems. among Australian adolescents: the role of socio-demographic

Help-seeking behaviour for emotional or behavioural problems. among Australian adolescents: the role of socio-demographic Help-seeking behaviour for emotional or behavioural problems among Australian adolescents: the role of socio-demographic characteristics and mental health problems Kerry A. Ettridge Discipline of Paediatrics

More information

Guidelines for Making Changes to DSM-V Revised 10/21/09 Kenneth Kendler, David Kupfer, William Narrow, Katharine Phillips, Jan Fawcett,

Guidelines for Making Changes to DSM-V Revised 10/21/09 Kenneth Kendler, David Kupfer, William Narrow, Katharine Phillips, Jan Fawcett, Guidelines for Making Changes to DSM-V Revised 10/21/09 Kenneth Kendler, David Kupfer, William Narrow, Katharine Phillips, Jan Fawcett, Table of Contents Page 1. Overview.1 2. Criteria for Change in the

More information

ANALYSIS AND CLASSIFICATION OF EEG SIGNALS. A Dissertation Submitted by. Siuly. Doctor of Philosophy

ANALYSIS AND CLASSIFICATION OF EEG SIGNALS. A Dissertation Submitted by. Siuly. Doctor of Philosophy UNIVERSITY OF SOUTHERN QUEENSLAND, AUSTRALIA ANALYSIS AND CLASSIFICATION OF EEG SIGNALS A Dissertation Submitted by Siuly For the Award of Doctor of Philosophy July, 2012 Abstract Electroencephalography

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

A guide to understanding variant classification

A guide to understanding variant classification White paper A guide to understanding variant classification In a diagnostic setting, variant classification forms the basis for clinical judgment, making proper classification of variants critical to your

More information

p e r s p e c t i v e

p e r s p e c t i v e n e u r o g e n o m i c s Genome-scale neurogenetics: methodology and meaning Steven A McCarroll 1,2, Guoping Feng 1,3,4 & Steven E Hyman 1,5 Genetic analysis is currently offering glimpses into molecular

More information

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE.

IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE. IN SILICO EVALUATION OF DNA-POOLED ALLELOTYPING VERSUS INDIVIDUAL GENOTYPING FOR GENOME-WIDE ASSOCIATION STUDIES OF COMPLEX DISEASE By Siddharth Pratap Thesis Submitted to the Faculty of the Graduate School

More information

Missing Heritablility How to Analyze Your Own Genome Fall 2013

Missing Heritablility How to Analyze Your Own Genome Fall 2013 Missing Heritablility 02-223 How to Analyze Your Own Genome Fall 2013 Heritability Heritability: the propor>on of observed varia>on in a par>cular trait (as height) that can be agributed to inherited gene>c

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

!"##"$%#"&!'&$'()$(%&'*& Terapia Pediatrica e Farmacologia dello Sviluppo +,-./&01,23&34,53& :&;.<&2-.=;:3&;.;2>6-6&-.&;&

!##$%#&!'&$'()$(%&'*& Terapia Pediatrica e Farmacologia dello Sviluppo +,-./&01,23&34,53& :&;.<&2-.=;:3&;.;2>6-6&-.&;& !!! "#$%&'($)*!+&,-$!.)/+$!+$!01,-$1'$!!"##"$%#"&!'&$'()$(%&'*& Terapia Pediatrica e Farmacologia dello Sviluppo 0$2-3!44566!! +,-./&01,23&34,53&63783.9-.:&;.6-6&-.&;& 582/-:3.3?;/-,.;2&@;5-2>&63:?3:;/-.:&#>A3&B&!-;C3/36&!.&))3'&!(2$&#)$7$23!+$(2$8-$#1'&!+$!177&'&#91!!

More information

Van test naar diagnose naar

Van test naar diagnose naar Van test naar diagnose naar V therapie op maat Marjolein Kriek, LUMC Joris Veltman, RUNMC Exome diagnostics in genetically heterogeneous disease Joris Veltman, PhD Department of Human Genetics Radboud

More information

Mapping of genes causing dyslexia susceptibility Clyde Francks Wellcome Trust Centre for Human Genetics University of Oxford Trinity 2001

Mapping of genes causing dyslexia susceptibility Clyde Francks Wellcome Trust Centre for Human Genetics University of Oxford Trinity 2001 Mapping of genes causing dyslexia susceptibility Clyde Francks Wellcome Trust Centre for Human Genetics University of Oxford Trinity 2001 Thesis submitted for the degree of Doctor of Philosophy Mapping

More information