Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John (jmahachie@ulg.ac.be) François Lishout Van (f.vanlishout@ulg.ac.be) Elena S Gusareva (gusareva.elena@gmail.com) Kristel Steen Van (kristel.vansteen@ulg.ac.be) Version: 2 Date: 4 March 2013 Author's response to reviews: see over

03 March 2013 Dear Editor (BioData Mining), We hereby submit a revised version of the manuscript 9193456947630351 entitled A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis, and will be grateful if you would reconsider this manuscript for publication in BioData Mining. We thank the editor for giving us the chance to resubmit this paper and to formulate responses to the comments given by the reviewers. We thank all reviewers for having taking time to read our manuscript and for their constructive comments and suggestions to further improve our manuscript. We have addressed all of the concerns raised. Should our point-by-point response (provided below) not have provided satisfactory replies, we remain available to provide additional information or more details. Please direct all correspondence concerning this manuscript to jmahachie@ulg.ac.be. Thank you for taking interest in our work. The authors declare no conflict of interest. Yours Sincerely, On behalf of the authors, Jestinah M. Mahachie John, PhD Phone: +32 4 366 9637 Email: jmahachie@ulg.ac.be

Point-by-Point replies to comments from the editor and Reviewers EDITOR During the preparations towards my PhD thesis defense, which involved setting up additional simulations and interpreting these, we realized that the qq-plots provided in Figure 4 could be improved, to better convey the anticipated messages. Therefore, as was done in my PhD thesis, we have replaced Figure 4 in the original submitted version by the new qq-plots of MB-MDR step 2 test values. As a result, we have altered some statements (not linked to reviewers comments) in the main submission as follows: Text in red is the newly inserted text. Deleted text has double strikethrough lines. Black text is as it was in the original submitted manuscript. In the Results section Figure 4 shows the qq-plots related to the SNP pairs and their MB-MDR step 2 test statistics (i.e., the maximum of two association tests; one involving H-cells versus {L,O}-cells, and one involving L-cells versus {H,O}-cells). shows the qq-plots related to the SNP pairs and their final MB-MDR test results (squared Student s t). However, recreating Figure 3, now for cell (2,2) instead of (0,0) (hence, the multilocus genotype cell which has the smallest number of individuals contributing to it), also highlights hard to ignore deviations from the theoretical F(1,498) distribution at the multilocus genotype cell labeling stage (see Supplementary Figure S2). This suggests that the largely deviating results observed in Figure 4 are a cumulative effect and the result of subsequently building on invalid test results. The outlying upper right dots in Figure 4 refer to test results corresponding to the causal epistatic SNP pair. Other scenarios show similar trends (results not shown).

Conclusion section However, the overall performance of MB-MDR, which includes a permutationbased correction for multiple testing, is not affected in terms of type I error control. Improved power can be obtained by pre-analysis data transformations. MB-MDR permutation-based maxt correction for multiple testing keeps type I error and false positive rates under control, since in all considered simulation scenarios, the assumption of subset pivotality of the maxt permutation strategy was plausible. Figure 4: Qq-plots of MB-MDR step 2 test values (squared Student s t), for normal and chi-squared trait distributions, and non-transformed or rank-transformed to normal data. For each setting, one replicate with epistatic variance 10% is considered and F-statistics are pooled for all SNP pairs over the 999 permutations. A theoretical F-distribution according to F (1,498) is taken as the reference. Qq-plots of observed final MB-MDR test values, for normal and chi-squared trait distributions, and non-transformed or rank-transformed to normal data. For each setting, one replicate with genetic variance, 10% is considered. A generated F- distribution according tof (1,498) is taken as the reference. REVIEWERS SECTION

We have responded to the 3 minor comments above. We have added the following paragraph in the background section to cover the concern of the reviewer. One of the pioneer methods used in the context of dimensionality reduction and gene-gene interaction detection is the Multifactor Dimensionality Reduction (MDR) method, initially developed by Ritchie et al. [2]. MDR offers an alternative to traditional regression-based approaches. The method is model-free and nonparametric in the sense that it does not assume any particular genetic model. In particular, MDR for binary traits [2] enforces a dimensionality reduction by pooling multilocus genotype classes into two groups of risk based on some threshold value, and by evaluating the epistasis model via cross-validation

principles. One concern related to the initial implementations of the MDR method was that some important interactions could be missed due to pooling too many multilocus genotype classes together. Another concern was that the MDR method did not facilitate making adjustments for lower-order genetic effects or confounding factors. Lastly, it was somewhat disappointing that after computationally intensive cross-validation and permutation-based significance assessment procedures only a single best epistasis model was proposed. Over the years, several attempts have been made to further improve the MDR ideas of Ritchie et al. [2], see for instance [3]. However, an MDR-based method was needed that could tackle all of the aforementioned issues within a unified framework and would flexibly accommodate different study designs of related and unrelated individuals. Model-Based Multifactor Dimensionality Reduction (MB-MDR) originated as such a unified dimensionality reduction approach. Like MDR, MB- MDR is an intrinsic non-parametric method, and thus avoids making hard to verify assumptions about genetic modes of inheritance. The original MB-MDR implementation in R by Calle et al. [4] suffered from its own drawbacks, the major one being the significance assessment of epistasis models, which was based on the derivation of MAF dependent null-distributions. These drawbacks were handled in subsequent C++ versions of the MB-MDR software, adhering to the key principles of the MB-MDR strategy [5]. In summary, these key features are 1) dimensionality reduction via multilocus genotype cell labeling using appropriate association tests, 2) prioritization of multiple epistasis models (on reduced constructs / lowerdimensional features) via appropriate association tests and adequate multiple testing corrections to control false positives, 3) possible adjustment for lower-order effects or confounders in relevant steps of the epistasis detection process. Scale transformations are quite common as remedial strategies to meet statistical testing assumptions. However, since the optimal scale transformation is often based

on theoretical motivations or statistical convenience, it often leads to new constructs that are hard to interpret or are biologically meaningless. Another concern related to implementing scale transformations is that non-additive signals may be removed as a direct consequence of such transformations prior to analysis [44]. Our results confirmed that rank-based transformations are generally most powerful when quantitative traits are non-normally distributed. Rank transformations serve as a bridge between non-parametrics and parametrics [45]. They naturally eliminate any problem of skewness (e.g. chi-squared distribution). By ranking the impact of outliers is minimized: regardless of how extreme the most extreme observation is, the same rank is given to it. A particular type of rank transformation uses percentile ranks and is referred to as rank transformation to normality. In this context, a percentile rank is defined as the proportion of quantitative trait outcomes in a distribution that a specific trait value is greater than or equal to. When the number of ties is negligible, it will lead to a near to perfect normal distribution, irrespective of the original trait s distribution, which usually is a highly desirable property. We remark that MB-MDR s dimensionality reduction step involves performing multiple ANOVA tests, one for each multilocus genotype cell, while comparing

two groups; one group consisting of a single multilocus genotype class, and another group consisting of the pooled remaining multilocus genotype cells at the considered loci. We agree that the statistical properties of two-group comparison tests have been studied at length elsewhere, in particular in the presence of highly unbalanced groups. These unbalances naturally arise in MB-MDR association testing during the risk cell labeling, since for two-way interactions one cell is contrasted against 8 remaining cells. Hence, we agree that the generally known results for two-group comparison tests will coincide with our findings for each of these 9 tests separately. However, the 9 test results are used to construct a lowerdimensional feature with 3 possible factor levels (H, L or O), and it was unknown how the aggregation of different levels of model violation with respect to the 9 aforementioned tests would affect the final MB-MDR association test (on aggregated H (L) cells). The key question that initiated this research (as was explained in the discussion section) was whether the large number of epistasis findings we usually found for quantitative traits some of these simply had to be false positives was due to an aggregation of model violations during internal association testing or due to another aspect of the MB-MDR method. Since not correcting for main effects was shown in earlier work to lead to increased false positives, we addressed the key question of interest by setting up simulations for pure epistasis models. No LD between markers was assumed. Interestingly, type I error was kept under control with the classical implementation of MB-MDR, even for non-normally distributed data. Our work, in particular inspecting the effects of MAFs on the 1-9 test statistic distributions as well as the final MB-MDR test statistic distribution showed that indeed there may be an accumulation of model violation problems operating (one that is MAF dependent and may lead to different marginal final MB-MDR test distributions, possible highly deviating from the theoretical final MB-MDR test distribution) and that our choice of 0.10 as default value for each of the 9 tests within MB-MDR as well as our choice of multiple testing correction (in particular the step down maxt procedure as implemented in MB-MDR) was able to adequately control FWER, regardless. The fact that more hits than expected are observed on real-life data as compared to synthetic data is

attributed to the fact that for real-life data often the assumption of subset pivotality is violated. Violation of this assumption does not guarantee strong control of the type I error with the step-down maxt algorithm. Our work also highlighted the fact that p-values for SNP-pairs should not be derived from the theoretical distribution but on resampling based null distributions. The maxt procedure derives such null distributions and at the same time corrects for multiple testing over all possible pairs considered in the epistasis screening. By saying that MB-MDR is non-parametric, we mean that MB-MDR does not make any parametric assumptions about the mode of epistatic inheritance. Parametric as in parametric association tests always refers to distributional assumptions that may or may not be violated. Throughout our manuscript, we have adhered to these original definitions. Our two-group comparison tests within MB- MDR involve Student s t tests, which are thus parametric tests. However, note that for testing whether two loci are globally associated with the trait of interest (i.e., no correction for main effects is performed), the MB-MDR method does not rely on any parametric regression modeling and is also in this sense, non-parametric. For the latter, we rather use the term not model-based. As soon as a correction is made for confounders (whether capturing an environmental measurement or evidence about population stratification), a shift towards parametric (regressionbased) modeling is required. This explains the MB in MB-MDR. Model-Based allows you to integrate whatever is needed from the parametric regression frameworks in order to make the relevant adjustments. We agree that the whole idea about (MB-)MDR was to avoid model-misspecification that are so typical for highly-dimensional (regression-based) modeling. However, we have shown that the most severe correction possible (maximum nr of degrees of freedom) will assure adequate type I error control although the method may become somewhat conservative while doing so.

We believe that the paragraph added to the discussion section (crf Moore s second point) also covers these 2 points raised by Motsinger-Reif. We have responded to the typos the reviewer observed.