Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S. December 17, 2014
1 Introduction Asthma is a chronic respiratory disease affecting more than 300 million people worldwide [1]. Previous studies of familial aggregation and segregation show that asthma has substantial heritability [2]. Asthma does not exhibit the inheritance pattern of a classic Mendelian disorder. Rather, asthma is a complex and multifactorial disease with multiple interacting genetic and environmental components, as demonstrated by recent linkage analysis, association studies and genome wide association studies [4]. There is no consensus on the mode of inheritance of asthma, and association studies have implemented different genetic models. Among those studies, co-dominant and additive models are common [3][4]. For example, the European Community Respiratory Health Survey employed a co-dominant model in their asthma association analysis, whereas additive models are common in twin studies [5][6][7][8][9][10]. Co-dominant models are advantageous in that these models are non-parametric, allowing each genotype a difference penetrance. While more powerful if correct, parametric models are also underpowered when incorrectly specified. Therefore, we considered both additive and co-dominant analyses. We perform a case-control association analysis of GWAS data to identify alleles that are associated with asthma or in linkage disequilibrium with disease susceptibility loci. 2 Study Methods 2.1 Study Population The cases for this study were drawn from CAMP, a clinical trial designed to determine the long-term effects of three asthma treatments [11]. All participants had mild to moderate asthma and provided DNA for genetic studies. Due to the limited number of available cases and in order to increase power to find associations, the CAMP probands were compared to controls from Illumina s icontroldb resource. After initial quality-control, 359 cases and 846 controls made up the CAMP/Illumina study. Genome-wide SNP genotyping for all subjects was performed on Illumina s Human-Hap550 Genotyping BeadChip [1]. 2.2 Quality Control Basic quality control was performed on the Camp/Illumina study data before our analysis. We used a number of criteria to further filter the subjects and SNPs in our data set. We were unable to search for Mendel errors due to not having any available family data for the cases or controls. No subjects needed to be removed after checking for missing phenotypes, incorrectly classified sex, or missing greater than 5% of SNPs. No SNPs had a missing frequency in greater than 5% of subjects. SNPs were removed if they met any of the following criteria: Hardy-Weinberg equilibrium p-values among controls were less than 0.01 [n=9,040] or minor allele frequency (MAF) was less than 5% [n=82,022]. Filtering resulted in 467,401 SNPs and 1205 subjects (359 cases and 846 controls). Using the filtered dataset, we checked for population stratification by considering the genomic inflation factor. With a value of 1.32, we found population stratification present in our subjects. 1
Using a subset of 9,884 SNPs which were independent (in approximate linkage equilibrium) as a result of linkage disequilibrium based pruning, we were able to obtain the principal components. The genomic inflation factor was 1.02 after controlling using principal component adjustment. Our final dataset exhibited minimal population stratification. Figure 1 shows the projection of the data for all 1205 Figure 1: MDS Asthma subjects onto the first 3 principal components. There is no evidence of separation along any of these principal component directions, so adjusting for principal components should be more than sufficient to correct for stratification. 3 Analysis and Results Table 1: Plink results CHR SNP BP HWE P Minor Case Freq. Cont. Freq. Major CHISQ P OR SE L95 U95 5 rs4700355 59368006 0.68 G 0.23 0.34 A 26.71 2.37E-07 0.59 0.10 0.48 0.72 5 rs1508864 59388568 0.60 C 0.23 0.34 T 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs1508859 59389854 0.60 T 0.23 0.34 C 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs7731007 59399588 0.60 G 0.23 0.34 T 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs10461667 59404215 0.60 C 0.23 0.34 T 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs1588265 59405551 0.60 G 0.23 0.34 A 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs13164971 59414410 0.60 C 0.23 0.34 A 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs2136203 59418081 0.60 C 0.23 0.34 T 26.54 2.58E-07 0.59 0.10 0.48 0.72 5 rs1100918 59422372 0.60 G 0.23 0.34 A 26.82 2.24E-07 0.59 0.10 0.48 0.72 5 rs2662444 59429941 0.60 G 0.23 0.34 T 26.82 2.24E-07 0.59 0.10 0.48 0.72 13 rs9546395 82927920 0.20 C 0.42 0.31 T 23.16 1.49E-06 1.56 0.09 1.30 1.86 Our analysis began by considering the possible genetic models to test for under for association. From a consideration of the literature, we determined that a recessive model would not make sense to test. We performed logistic regression controlling for sex and the first two MDS components under both an additive and codominant model. Under the codominant model, none of the SNPs reached significance, so our further analysis focused on our tests using an additive model. We considered 10 significant SNPs, determined through genomic content corrected p-values and a cut-off of 1.0E-6 (Table 1). Bonferroni corrected SNPs did not reach significance in our model. These P-values are displayed on the log 10 scale in a Manhattan plot in Figure 2. 2
Figure 2: Manhattan Plot The QQ-plot of quantiles of p-values against theoretical quantiles of the uniform distribution do not suggest systematic positive bias (Figure 3). The deviation of observed from predicted quantiles at elevated significance levels could indicate true positive associations. Clusters of adjacent SNPs with elevated p-values likely stem from local linkage disequilibrium. Using the UCSC Genome Browser, we mapped these 10 SNPs onto the genome and found that the top ranked SNP (rs9546395 on chromosome 13) is not located on a gene, nor are its nearest neighboring protein-coding genes (SLITRK1, SPRY2, NDFIP2) known to have functions as- Figure 3: QQ-Plot sociated with asthma. The remaining 9 SNPs all lie on gene PDE4D (variant 1, chr5:59,986,401-60,488,098), the same gene identified by Himes et al. in their GWA study. We generated a linkage disequilibrium plot in HaploView showing nearly perfect LD between the 9 significant SNPs on chromosome 5. All pairs of SNPs have D = 1 and R 2 >= 99% (Figure 4). 3
Figure 4: HaploView for Chromosome 5 4 Discussion The 10 SNPs identified in our analysis were found using a cut off of 1.0E-6, rather than a traditional correction, as the Bonferroni corrected SNPs did not reach significance in our model. In addition, the lack of significance under the codominant model implies a possible lack of power in the study. Unfortunately, the design for this study was not optimal due to the original CAMP data being underpowered for association analysis, and the need for controls which were not gathered at the outset of the study. However, the fact that all but one of the significant SNPs were located in the same gene adds to the reliability of our conclusion of association with asthma. All 9 of the top 10 SNPs that are on chromosome 5 lie on the gene PDE4D, which is known to be associated with airway control [9]. These 9 SNPs include, rs1588265, one of the SNPs found by Himes et al. to be among the top 5 SNPs which were individually associated with asthma from their analysis. Given the robust association between a set of adjacent SNPs on chromosome 5 and asthma, future directions could include targeted genotyping of this region in an expanded cohort, or studies in cell culture to determine the mechanisms by which PDE4D influences asthmatic phenotypes. In particular, targeted mutagenesis could determine if any implicated SNP leads to impairment of PDE4D expression or function. Expression QTL analysis (eqtl) could determine if the SNP in chromosome 13 has a distal regulatory function. 4
5 Bibliography [1] Himes BE, Hunninghake GM, Baurlely JW. Genome-wide Association Analysis Identified PDE4D as an Asthma-Susceptibility Gene. The American Journal of Human Genetics (2009); 84: 581-593. [2] Cookson WOCM. Asthma genetics. Chest (2002); 121: 7S 13S. [3] Wiener AS, Zieve I, Fries JH. The inheritance of allergic disease. Ann Eugenics (1936); 7: 141 62. [4] Los, H., G. H. Koppelman, and D. S. Postma. The importance of genetic influences in asthma. European Respiratory Journal 14.5 (1999): 1210-1227. [5] The European Community Respiratory Health Survey Group. Genes for asthma? An analysis of the European Community Respiratory Health Survey. Am J Respir Crit Care Med (1997); 156: 1773 80. Duffy DL, Martin NG, Battistutta D, Hopper JL, Matthews JD. Genetics of asthma and hayfever in Australian twins. Am Rev Respir Dis (1990); 142: 1351-1358. [6] Harris JR, Magnus P, Samuelsen SO, Tambs K. No evidence for effects of family environment on asthma. A retrospective study of Norwegian twins. Am J Respir Crit Care Med (1997); 156: 43-49. [7] Laitinen T, Rasanen M, Kaprio J, Koskenvuo M, Laitinen LA. Importance of genetic factors in adolescent asthma. Am J Respir Crit Care Med (1998); 157: 1073-1078. [8] Lichtenstein P, Svartengren M. Genes, environments and sex: factors of importance in atopic diseases in 7 to 9-year-old Swedish twins. Allergy (1997); 52: 1079-1086. [9] Skadhauge LR, Cristensen K, Kyvik KO, Sigsgaard T. Genetic and environmental influence on asthma: a population based study of 11,688 Danish twin pairs. Eur Respir J (1999); 13: 8-14. [10] Méhats C, Jin SL, Wahlstrom J, Law E, Umetsu DT, Conti M. PDE4D plays a critical role in the control of airway smooth muscle contraction. FASEB J (2003); 17(13): 1831-41. [11] NHLBI/Division of Lung Diseases. The Childhood Asthma Management Program (CAMP): Design, Rationale, and Methods. Controlled Clinical Trials (1999); 20: 91-120. 5