A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction

Size: px
Start display at page:

Download "A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction"

Transcription

1 American Journal of Epidemiology ª The Author Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Vol. 171, No. 4 DOI: /aje/kwp398 Advance Access publication: January 4, 2010 Practice of Epidemiology A Comparison of Sample Size and Power in Case-Only Association Studies of Gene-Environment Interaction Geraldine M. Clarke* and Andrew P. Morris * Correspondence to Dr. Geraldine M. Clarke, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, United Kingdom ( gclarke@well.ox.ac.uk). Initially submitted April 15, 2009; accepted for publication November 9, Assuming continuous, normally distributed environmental and categorical genotype variables, the authors compare 6 case-only designs for tests of association in gene-environment interaction. Novel tests modeling the environmental variable as either the response or the predictor and allowing a genetic variable with multiallelic variants are included. The authors show that tests imposing the same genotypic pattern of inheritance perform similarly regardless of whether genotype is the response variable or the predictor variable. The novel tests using the genetic variable as the response variable are advantageous because they are robust to non-normally distributed environmental exposures. Dominance deviance deviation from additivity in the main or interaction effects is key to test performance: When it is zero or modest, tests searching for a trend with increasing risk alleles are optimal; when it is large, tests for genotypic effects are optimal. However, the authors show that dominance deviance is attenuated when it is observed at a proxy locus, which is common in genome-wide association studies, so large dominance deviance is likely to be rare. The authors conclude that the trend test is the appropriate tool for large-scale association scans where the true gene-environment interaction model is unknown. The common practice of assuming a dominant pattern of inheritance can cause serious losses of power in the presence of any recessive, or modest dominant, effects. dominance; genome-wide association study; interaction; linkage disequilibrium; lung neoplasms; power; research design; sample size Abbreviations: ERCC2, excision repair cross-complementing group 2; G-E, gene-environment. Gene-environment (G-E) interaction refers to variation in the effect of genetic factors, measured in a suitable scale, as a function of variation in environmental exposure and vice versa. Case-only study designs have been shown to be more powerful than case-control study designs in the assessment of possible interactions between genetic factors and environmental exposures in the etiology of a disease (1, 2). The ability to test for G-E interaction using the case population only may be advantageous for example, in studies that rely on public population control data for which information on environmental exposures of interest is not available. Assuming independence between environment and genotype in the population of controls, Begg and Zhang (3), Piegorsch et al. (2), Umbach and Weinberg (4), and Yang et al. (5) showed that efficient estimates of interaction for categorical environment and binary genotype variables can be obtained via logistic regression in a case-only analysis. Albert et al. (6), Armstrong (7), and Cheng (8) explored extensions to continuous environment and categorical genotype variables using both logistic and multinomial regression. Typically, the genetic factor is treated as the response variable, with the environmental exposure being used as the predictor variable; the converse, whereby the response variable is the environmental exposure and thus the effect of a genetic factor on the association between the environmental exposure and disease is studied, has received less attention. With the genetic factor used as the predictor variable rather than the response variable, there is greater freedom to categorize it appropriately. By definition, a successful finding in either situation represents a G-E interaction. Kraft et al. (9) 498

2 Case-Only G-E Interaction 499 compared the power and sample-size requirements of various case-control study designs for testing for G-E interaction to a single case-only design that assumed a dominant inheritance pattern at the genetic locus. Like other investigators (2, 5), Kraft et al. showed that in the presence of any interaction effect, the case-only design was most powerful. Here we consider a categorical genetic variable G and a normally distributed environmental exposure variable E. We assume independence between G and E in the population under study and compare the power and sample-size requirements of 6 case-only tests for detecting G-E interaction. Three categorizations of G are considered: binary, ordinal, and nominal. The first 3 tests model E as the response variable using a linear regression with each category of G as the predictor. The remaining 3 tests use multinomial regression techniques to allow each categorization of G to be the response variable, with E as the predictor variable: A logistic regression (this test is the same as that examined by Kraft et al. (9)), a proportional odds regression, and a multinomial regression are performed for binary, ordinal, and nominal codings of G. Each test is considered under 3 different penetrance models that are based on logistic risk models. MATERIALS AND METHODS Models Let D be an indicator of disease (D ¼ 1 if affected and D ¼ 0 otherwise), E a standard normally distributed environmental exposure variable, and g a genotype at a disease susceptibility locus G with alleles A and a.letp A denote the prevalence of the minor allele, A. Let G denote the minor allele count: G ¼ 0 when g ¼ a/a, G ¼ 1 when g ¼ a/a, and G ¼ 2 when g ¼ A/A. Let G D equal 1 when g ¼ a/a; otherwise, G D ¼ 0. G is then an additive genotype coding and G D is its deviation from additivity. We consider true penetrance models based on the log odds of disease: PrðD ¼ 1j G; EÞ log ¼ b c þ b G G þ b GD G D þ b E E PrðD ¼ 0j G; EÞ þ b þ b D D : ð1þ b G represents an additive genetic main effect with b GD its deviation from additivity, hereafter referred to as dominance deviance; b E represents an environmental main effect; and b represents an additive G-E interaction effect with b D its dominance deviance. In practical terms, b E represents the log odds ratio for disease risk associated with a 1-unit increase in E among persons with no minor alleles; b G þ b GD and 2b G are the log odds ratios associated with an increase of 1 and 2 minor alleles, respectively, in persons with zero exposure, etc. We assume that G and E are independent in the population under study. Cases are persons affected by the disease of interest (D ¼ 1). Below we describe the 6 case-only tests considered. Each test is based on a model that generates a log likelihood, and tests of G-E interaction are based on standard likelihood ratio statistics. The specific likelihoods associated with each model and our approach to power and sample-size estimation are described in the Web Appendix, which is posted on the Journal s Web site ( Linear regression additive ( T ) model. The relation between E and G is modeled as a simple linear regression assuming additive genetic effects at the genetic locus: EðEjGÞ ¼b ð1þ c þ b ð1þ G: The likelihood ratio test constrains b ð1þ [ 0 under the null hypothesis and has 1 df. This test assumes a trend in risk with an increasing number of risk alleles and is referred to as a trend test. Linear regression dominant ( D ) model. The relation between E and G is modeled as a simple linear regression, assuming a dominant pattern of inheritance at the genetic locus: G DOM ¼ 1 for carriers of the minor allele and zero otherwise. EðEjGÞ ¼b ð2þ c þ b ð2þ DOM G DOM : The likelihood ratio test constrains b ð2þ DOM [ 0 under the null hypothesis and has 1 df. This test is referred to as a dominant test. Linear regression deviation additivity ( G ) model. The relation between E and G is modeled as a simple linear regression but with the introduction of a term to model any dominance deviance associated with the interaction: EðEjGÞ ¼b ð3þ c þ b ð3þ G þ bð3þ D G D : The likelihood ratio test constrains b ð3þ [ 0 and bð3þ D [ 0 under the null hypothesis and has 2 df. This test allows independent genotype effects and is referred to as a genotypic test. Logistic regression dominant (G D E) model. The relation between carriers of the minor allele and E is modeled as a logistic regression: log PðGDOM ¼ 1j EÞ ¼ b ð4þ c þ b ð4þ PðG DOM ¼ 0j EÞ E: If the genetic factor and the environmental exposure are independent among the cases, then b ð4þ is a consistent estimator of the interaction parameter measuring the departure of the joint effects of E and G DOM from multiplicative risk ratios (8). If the disease is rare, then b ð4þ is also a good estimator of the interaction parameter measuring the departure of the joint effects of E and G DOM from multiplicative odds ratios. The likelihood ratio test constrains b ð4þ [ 0 under the null hypothesis of no interaction and has 1 df. Like D, this test assumes a dominant pattern of inheritance at the genetic locus and is referred to as a dominant test. Proportional odds regression (G T E) model. We compare the probability of an equal or smaller number of high-risk

3 500 Clarke and Morris alleles, G k, to the probability of a larger number, G > k, as a function of the continuous environmental exposure E: PðG kj EÞ log ¼ b ð5þ ck bð5þ E for k ¼ 0; 1: PðG > kj EÞ If b ð5þ > 0, then increasing values of E are associated with increasing genetic risk in the case population. The likelihood ratio test constrainsb ð5þ [ 0 under the null hypothesis of no interaction and has 1 df. Like T, this test assumes a trend in risk with an increasing number of risk alleles and is referred to as a trend test. Multinomial regression genetic response (G G E) model. The probability of observing 1 or 2 minor alleles is compared with the probability of observing none: PðG ¼ kj EÞ log ¼ b ð6þ ck PðG ¼ 0j EÞ þ bð6þ ke for k ¼ 1; 2: The likelihood ratio test constrainsb ð6þ 1 [ 0 and bð6þ 2 [ 0 under the null hypothesis and has 2 df. Like G, this test allows independent genotype risks and is referred to as a genotypic test. Power calculations We calculated the number of cases required to achieve 80% statistical power, as well as the power for 1,000 case participants, to detect G-E interaction at the disease susceptibility locus G. We considered a range of minor allele frequencies (p A ¼ 0.1, 0.25), genetic main effects (b G ¼ 0, log(1.3)), environmental main effects (b E ¼ 0, log(1.3)), and G-E interaction effects (b ¼ 0, log(1.1), log(1.2), log(1.3)). To reduce the dimensionality of the model parameter space, we assumed a common value d for the dominance deviance associated with main and interaction genetic effects: d ¼ b GD ¼ b D.Whend ¼ 0, the model is additive and when d > 0(respectively,d < 0), dominant (respectively, recessive) main and interaction genetic effects are present in the underlying model. We considered a range of realistic disease models by allowing the dominance deviance to vary within the range 6minðb G ; b Þ to ensure no heterozygote advantage: If d > b, for example, then the penetrance associated with the heterozygotic genotype may be greater than the homozygotic genotype giving a heterozygote advantage. We also performed similar calculations assuming that the interaction was observed at a nearby noncausal locus H with minor allele frequency p B ¼ 0.2 and the maximum value of linkage disequilibrium r 2 between alleles at the disease susceptibility locus G. There are algebraic restrictions on the range of r 2 (10), and the maximum value of linkage disequilibrium between alleles at G and H when p B ¼ 0.2 depends on the value of p A : When p A ¼ 0.1, maximum r 2 ¼ 0.44; when p A ¼ 0.25, maximum r 2 ¼ 0.75; and when p A ¼ 0.4, maximum r 2 ¼ All tests assumed a 2-sided alternative hypothesis and a significance level of One of the underlying assumptions of the tests of interaction with E as the response, T, D, and G, is that E is normally distributed. In order to assess the effect of nonnormality of E, we calculated power when only cases with extreme values of E, jej > 1, were included and when 10% Table 1. Sampling Units for Case-Only Tests of Gene-Environment Interaction at a Disease Susceptibility Locus G Under Zero Deviation from Additivity a Additive d 5 0 p A e b G e be e b T D G G D E G T E G G E , , , , , , , , , , , , , , , , a Number of sampling units required for case-only test of geneenvironment interaction T to achieve 80% power for a range of environmental and genetic main effects and gene-environment interaction effects under zero dominance deviance and the ratio of sampling units required in each of the other tests to achieve the same power in comparison with the number required in T. Dominance deviance is defined as the deviation from additivity in the main or interaction effects: Here we assume that the dominance deviance in the main effect and the interaction effect are the same. A sampling unit is a case.

4 Case-Only G-E Interaction 501 Table 2. Sampling Units for Case-Only Tests of Gene-Environment Interaction at a Disease Susceptibility Locus G Under Positive Deviation from Additivity a p A e bg e be e b (d > 0 ½d 5 logð1:1þš) Positive Deviation from Additivity T D G G D E G T E G G E , , , , , , , , , , , , a Number of sampling units required for case-only test of geneenvironment interaction T to achieve 80% power for a range of environmental and genetic main effects and gene-environment interaction effects under positive dominance deviance and the ratio of sampling units required in each of the other tests to achieve the same power in comparison with the number required in T. Dominance deviance is defined as the deviation from additivity in the main or interaction effects: Here we assume that the dominance deviance in the main effect and the interaction effect are the same. A sampling unit is a case. Table 3. Sampling Units for Case-Only Tests of Gene-Environment Interaction at a Disease Susceptibility Locus G Under Negative Deviation from Additivity a p A e bg e be e b (d < 0 ½d 52logð1:1ÞŠ) Negative Deviation from Additivity T D G G D E G T E G G E , , , , , , , , , , , , , , , , , , , , a Number of sampling units required for case-only test of geneenvironment interaction T to achieve 80% power for a range of environmental and genetic main effects and gene-environment interaction effects under negative dominance deviance and the ratio of sampling units required in each of the other tests to achieve the same power in comparison with the number required in T. Dominance deviance is defined as the deviation from additivity in the main or interaction effects: Here we assume that the dominance deviance in the main effect and the interaction effect are the same. A sampling unit is a case.

5 502 Clarke and Morris Figure 1. Statistical power to detect gene-environment interaction for 1,000 case participants as a function of the dominance deviance d for tests T (solid lines), D (dashed lines), G (dotted lines), and G T E (dashed-dotted lines) at a disease susceptibility locus G for various values of the high-risk allele frequency p A and interaction effect size b :A)p A ¼ 0.1, b ¼ log(1.1); B) p A ¼ 0.1, b ¼ log(1.2); C) p A ¼ 0.1, b ¼ log(1.3); D) p A ¼ 0.25, b ¼ log(1.1); E) p A ¼ 0.25, b ¼ log(1.2); F) p A ¼ 0.25, b ¼ log(1.3). Dominance deviance is defined as the deviation from additivity in the main or interaction effects: Here we assume that the dominance deviance in the main effect and the interaction effect are the same. Results are shown for the situation where there is no main environmental effect, b E ¼ 0, and the main genetic effect is b G ¼ logð1:3þ; variation in the sizes of main environmental or genetic effects resulted in only minor variations in power and not did not alter conclusions regarding the relative efficacies of the tests considered. The environmental exposure variable was normally distributed and standardized. of cases were allowed to have extreme values of E with 9 times the standard deviation of the remaining 90% of cases. Example: study of G-E interaction for lung cancer In a case-control study, Zhou et al. (11) found an interaction between polymorphisms in the excision repair crosscomplementing group 2 (ERCC2) gene and cumulative cigarette smoking exposure in lung cancer. They studied 1,092 cases and 1,240 controls selected from the same Caucasian population. Using parameter values estimated from their analyses and a true penetrance model of the form given in equation 1, we calculated the number of samples that would be required to achieve 80% power in a standard case-control test of G-E interaction, as well as in our 6 case-only tests. RESULTS General design comparisons The number of cases required to achieve 80% power in each of the dominant tests, G D E and D, hardly differs for any set of parameter values studied (Tables 1, 2, and 3). Similarly, approximately equivalent numbers of cases are also required to achieve the same power in the genotypic tests, G and G G E, with the largest differences being seen at the largest main interaction effect sizes. The trend tests T and G T E require similar numbers of cases to achieve the same power under a strictly additive model but show greater differences in the presence of any dominance deviance. Figure 1 shows the power of the tests to detect G-E interaction with 1,000 case participants as a function of the dominance deviance when observing genotypes at a disease susceptibility locus. Under the null hypothesis, when there is no G-E interaction present in the underlying model (b ¼ 0 and d ¼ 0), none of the tests show any increase in the false-positive rate; that is, they all have power equal to the type I error rate. In the presence of no dominance deviance or a slight dominance deviance, d 0, the trend tests are the most powerful. For example, Figure 1, part D, shows that the trend tests are most powerful for dominance deviance in the range 0.06 < d < While the strictly additive trend test T appears to be marginally preferable compared with the proportional odds trend test G T E, there is little information with which to choose between the 2 tests at low values of dominance deviance.

6 Case-Only G-E Interaction 503 Figure 2. Statistical power to detect gene-environment interaction for 1,000 case participants as a function of the dominance deviance d for tests T (solid lines), D (dashed lines), G (dotted lines), and G T E (dashed-dotted lines) at a noncausal locus H with minor allele frequency p B ¼ 0.2 for various values of the minor allele frequency p A and interaction effect size b at the disease susceptibility locus G:A)p A ¼ 0.1, b ¼ log(1.1); B) p A ¼ 0.1, b ¼ log(1.2); C) p A ¼ 0.1, b ¼ log(1.3); D) p A ¼ 0.25, b ¼ log(1.1); E) p A ¼ 0.25, b ¼ log(1.2); F) p A ¼ 0.25, b ¼ log(1.3). The alleles at H are in maximum linkage disequilibrium (r 2 ) with the alleles at G: when p A ¼ 0.1, r 2 ¼ 0.44; when p A ¼ 0.25, r 2 ¼ 0.75; and when p A ¼ 0.4, r 2 ¼ Dominance deviance is defined as the deviation from additivity in the main or interaction effects: Here we assume that the dominance deviance in the main effect and the interaction effect are the same. Results are shown for the situation where there is no main environmental effect, b E ¼ 0, and the main genetic effect is b G ¼ logð1:3þ; variation in the sizes of main environmental or genetic effects resulted in only minor variations in power and not did not alter conclusions regarding the relative efficacies of the tests considered. The environmental exposure variable was normally distributed and standardized. For larger positive dominance deviance, d >> 0, indicating large dominant effects, the dominant tests are, as expected, most powerful. However, for any negative dominance deviance, d < 0, suggesting a tendency towards recessive effects at the genetic locus, the dominant tests perform poorly in comparison with other tests. The genotypic tests are only optimal in the presence of large absolute values of d: When large and dominant (d >> 0), they equal the dominant tests, and when large and recessive (d << 0), they outshine all of the other tests. But how likely is a large dominance deviance? When testing at a noncausal locus with alleles that are not in complete linkage disequilibrium with alleles at the causal locus (r 2 < 1), common in genome-wide association studies, the observed effects are attenuated; in particular, the apparent dominance deviance at the noncausal locus is reduced in comparison with that observed at the causal locus (see Web Appendix ( for discussion). Thus, the chance of a large dominance deviance is even less when testing at a noncausal locus, suggesting that use of genotypic tests must be carefully considered and that the comparison between tests in the presence of low dominance deviance is of the greatest practical importance. Figure 2 shows the power of the tests to detect G-E interaction with 1,000 case participants as a function of the dominance deviance when observing genotypes at a noncausal locus in linkage disequilibrium with the disease susceptibility locus described in Figure 1. While power, in general, is lower when testing at a noncausal locus, a comparison of Figures 1 and 2 indicates that the same pattern of test efficacy according to variation in d holds when testing at a noncausal locus as when testing at the disease susceptibility locus, except that the range of values of d for which the trend tests are optimal is wider. When cases were selectively sampled so that only those with extreme values of E, jej > 1, were included, the power of all tests was increased, but there were no changes in conclusions regarding relative test efficacies (results not shown). This increase in power under selective sampling was previously observed by Boks et al. (12). Furthermore, under this modest nonnormality, there was no increase in the false-positive error rate in any of the tests when no G-E interaction was present in the underlying model. When more extreme departure from normality was allowed, so that 10% of cases were allowed to have environmental exposure values with standard deviations 9 times that of the remaining

7 504 Clarke and Morris 90% of cases, the type I error rate for the tests with E as the response variable did increase, but only marginally for example, from 0.05 to an average of in T. Results suggest that the tests T, D, and G, where E is assumed to be a normally distributed response, are robust to this type of nonnormality. Example: study of G-E interaction for lung cancer Zhou et al. (11) considered a standard case-control test of G-E interaction between the ERCC2 polymorphism Asp312Asn and the square root of pack-years of smoking which adjusted for age, sex, smoking status, time since smoking cessation (years), and the interaction between smoking status and the square root of pack-years. The frequency of the Asp allele (Asp312Asn) was 0.3, the square root of pack-years was approximately normally distributed with mean and standard deviation 3, and the adjusted odds ratios for lung cancer risk in persons with zero pack-years were 1.5 (95% confidence interval: 1.0, 2.3) for the Asp/Asn genotype and 3.4 (95% confidence interval: 1.9, 6.0) for the Asn/Asn genotype (when each was compared with the Asp/ Asp genotype). These risks decreased by exp( 0.07Ok) for the heterozygous Asp/Asn genotype and by exp( 0.17Ok) for the homozygous Asn/Asn genotype for k pack-years. The decrease in risk was not significant for the homozygous genotype and was only significant up to approximately 30 pack-years for the heterozygous genotype. Further analysis of Zhou et al. s results suggests that the unadjusted odds ratio for lung cancer risk in persons with the Asp/Asp genotype associated with k pack-years of smoking was in the approximate range exp(0.3ok) to exp(0.4ok). Assuming these parameter values, Table 4 compares the numbers of samples required to achieve 80% power in a standard case-control test of G-E interaction, as well as in our 6 case-only tests. Our results are unadjusted for any other covariates. The case-only tests require substantially fewer sampling units than the case-control tests. The most powerful test is T, closely followed by G T E. These tests both search for a trend in the interaction with increasing number of risk alleles. The model virtually stipulates zero dominance deviance associated with the interaction effect (b D ¼ 0:01), indicating that the extra degree of freedom required by using G and G G E rather than T or G T Eis not warranted. However, the estimated dominance deviance for the main genetic effect is negative (b GD ¼ 0:22), suggesting a slight recessive effect and making it much more difficult for the tests imposing a dominant model on the genotype, D and G D E, to detect effects. Notice that the marginal effect size of pack-years on lung cancer risk does not alter conclusions regarding the relative test efficacies. DISCUSSION We compared the power and sample-size requirements of 6 case-only tests for detecting interaction between a genetic factor and a continuous environmental exposure in association with disease. Under the null hypothesis of no G-E interaction, none of the tests showed any increase in the Table 4. Numbers of Sampling Units Required to Achieve 80% Power to Detect a Gene-Environment Interaction Between the ERCC2 Asp312Asn Polymorphism and Pack-Years of Smoking a b E Case-Control Case-Only Tests Test T D G G D E G T E G G E , , , , , , Abbreviation: ERCC2, excision repair cross-complementing group 2. a The true penetrance model is of the form given in equation 1 (see text), and effect sizes are b G ¼ 0:61; b GD ¼ 0:22; b ¼ 0:09; and b D ¼ 0:01, as estimated by Zhou et al. (11). Results are shown for values of b E as indicated for a 2-df case-control test of G-E interaction and our 6 case-only tests of G-E interaction. A sampling unit is a pair for the case-control test and a case for the case-only tests. false-positive rate, while under the complementary hypothesis, all of the tests had power greater than the type I error rate. We conclude that these tests are valid tests of G-E interaction. More work will be required to interpret the estimated coefficients from these case-only regressions in terms of risk ratios and other measures of effect size. In order to simplify analyses, a commonly used approach in case-only tests of G-E interaction is to assume either an autosomal dominant or recessive inheritance pattern at the genetic locus (8, 9, 13, 14). Such models are represented in the tests G D E and D. The novel tests based on multinomial and proportional odds models that we considered here allow for alternative patterns of genetic inheritance. For normally distributed environmental exposures, we have shown that D, a test using linear regression with the environmental variable as the response variable, requires similar numbers of cases as G D E, a test using logistic regression with the genetic variable as the response variable, in order to achieve the same power. Similarly, G and G G E also require similar numbers of cases to achieve equivalent power. This is as expected, since these pairs of tests each model the genetic variable in the same manner; the genetic variant is modeled as dominant in G D E and D and as genotypic in G and G G E. Whereas T imposes an additive model on the genetic variable, G T E only imposes a trend in the form of its proportional odds assumption. As such, these 2 trend tests show greater differences between the numbers of cases required to achieve the same power in any given scenario. T, D, and G depend on an assumption that the environmental exposure is normally distributed, and while we have shown that departure from normality need not have a significant impact on power outcomes, G D E, G T E, and G G E are clearly advantageous, since they do not require this assumption and are robust to non-normally distributed environmental exposures (15). Careful consideration of the scale of measurement is important, because changing the

8 Case-Only G-E Interaction 505 scale, such as transforming a quantitative environmental variable, can remove or add G-E interaction. We have shown that the dominance deviance is key to determining the optimal test. For zero (suggesting strictly additive genetic effects) or modest (suggesting slight recessive or dominant genetic effects) dominance deviance, the trend tests, T and G T E, are optimal. For large positive dominance deviance (suggesting a large dominant genetic effect), the dominant tests D and G D E are strictly optimal, closely followed by the genotypic tests G and G G E. However, if the dominance deviance is actually negative (suggesting a recessive genetic effect), the choice of a dominant test can have severe implications for power, and in the presence of a large recessive genetic effect, only a genotypic test will be suitable. Hence, if the dominance deviance is large but of unknown direction, the genotypic test is the safest choice. However, we showed that when testing at a noncausal locus, the dominance deviance at the observed locus is attenuated. Thus, for the genotypic tests to outperform the trend tests, dominance deviance at the disease susceptibility locus must be very large when testing for G-E interaction at a noncausal locus with alleles in less-than-perfect linkage disequilibrium with alleles at the causal locus. We conclude that the trend test is the most robust choice for large-scale association scans where the true G-E interaction model is unknown. The common practice of assuming a dominant pattern of inheritance can cause serious losses of power in the presence of any recessive, or modest dominant, effects. The exemplary data method used to calculate power relies on asymptotic distributions, and the results presented here are valid for the large sample sizes typically seen in current association studies. In critical applications, simulation studies would be required to determine precise power estimates (16). The case-only tests presented here are only valid under the assumption of independence between gene and environmental variables; any dependence between these variables will be reflected in the case series even in the absence of G-E interaction and will lead to biased estimates of interaction. Independence is therefore required to protect against falsepositive findings. Empirical studies have suggested that violation of gene-environment independence, when it occurs, is usually modest (17), but more work will be required to categorize the impact of departure from gene-environment independence on the relative efficacies of the tests considered here. ACKNOWLEDGMENTS Author affiliation: Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom (Geraldine M. Clarke, Andrew P. Morris). This work was supported by the Wellcome Trust. Conflict of interest: none declared. REFERENCES 1. Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol. 1996; 144(3): Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13(2): Begg CB, Zhang ZF. Statistical analysis of molecular epidemiology studies employing case-series. Cancer Epidemiol Biomarkers Prev. 1994;3(2): Umbach DM, Weinberg CR. Designing and analysing casecontrol studies to exploit independence of genotype and exposure. Stat Med. 1997;16(15): Yang Q, Khoury MJ, Flanders WD. Sample size requirements in case-only designs to detect gene-environment interaction. Am J Epidemiol. 1997;146(9): Albert PS, Ratnasinghe D, Tangrea J, et al. Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol. 2001;154(8): Armstrong BG. Fixed factors that modify the effects of timevarying factors: applying the case-only approach. Epidemiology. 2003;14(4): Cheng KF. A maximum likelihood method for studying geneenvironment interactions under conditional independence of genotype and exposure. Stat Med. 2006;25(18): Kraft P, Yen YC, Stram DO, et al. Exploiting geneenvironment interaction to detect genetic associations. Hum Hered. 2007;63(2): Weir BS. Inferences about linkage disequilibrium. Biometrics. 1979;35(1): Zhou W, Liu G, Miller DP, et al. Gene-environment interaction for the ERCC2 polymorphisms and cumulative cigarette smoking exposure in lung cancer. Cancer Res. 2002;62(5): Boks MP, Schipper M, Schubart CD, et al. Investigating gene environment interaction in complex diseases: increasing power by selective sampling for environmental exposure. Int J Epidemiol. 2007;36(6): Stern MC, Johnson LR, Bell DA, et al. XPD codon 751 polymorphism, metabolism genes, smoking, and bladder cancer risk. Cancer Epidemiol Biomarkers Prev. 2002;11(10): Yang Q, Khoury MJ, Sun F, et al. Case-only design to measure gene-gene interaction. Epidemiology. 1999;10(2): O Brien PC. Comparing 2 samples: extensions of the t, ranksum, and log-rank tests. J Am Stat Assoc. 1988;83(401): Brown BW, Lovato J, Russell K. Asymptotic power calculations: description, examples, computer code. Stat Med. 1999; 18(22): Liu X, Fallin MD, Kao WH. Genetic dissection methods: designs used for tests of gene-environment interaction. Curr Opin Genet Dev. 2004;14(3):

Flexible Matching in Case-Control Studies of Gene-Environment Interactions

Flexible Matching in Case-Control Studies of Gene-Environment Interactions American Journal of Epidemiology Copyright 2004 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 59, No. Printed in U.S.A. DOI: 0.093/aje/kwg250 ORIGINAL CONTRIBUTIONS Flexible

More information

breast cancer; relative risk; risk factor; standard deviation; strength of association

breast cancer; relative risk; risk factor; standard deviation; strength of association American Journal of Epidemiology The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail:

More information

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads

Allowing for Missing Parents in Genetic Studies of Case-Parent Triads Am. J. Hum. Genet. 64:1186 1193, 1999 Allowing for Missing Parents in Genetic Studies of Case-Parent Triads C. R. Weinberg National Institute of Environmental Health Sciences, Research Triangle Park, NC

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies

Selection Bias in the Assessment of Gene-Environment Interaction in Case-Control Studies American Journal of Epidemiology Copyright 2003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 158, No. 3 Printed in U.S.A. DOI: 10.1093/aje/kwg147 Selection Bias in the

More information

(b) What is the allele frequency of the b allele in the new merged population on the island?

(b) What is the allele frequency of the b allele in the new merged population on the island? 2005 7.03 Problem Set 6 KEY Due before 5 PM on WEDNESDAY, November 23, 2005. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Two populations (Population One

More information

Assessing Gene-Environment Interactions in Genome-Wide Association Studies: Statistical Approaches

Assessing Gene-Environment Interactions in Genome-Wide Association Studies: Statistical Approaches Research Report Assessing Gene-Environment Interactions in Genome-Wide Association Studies: Statistical Approaches Philip C. Cooley, Robert F. Clark, and Ralph E. Folsom May 2014 RTI Press About the Authors

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Multinominal Logistic Regression SPSS procedure of MLR Example based on prison data Interpretation of SPSS output Presenting

More information

Modeling Binary outcome

Modeling Binary outcome Statistics April 4, 2013 Debdeep Pati Modeling Binary outcome Test of hypothesis 1. Is the effect observed statistically significant or attributable to chance? 2. Three types of hypothesis: a) tests of

More information

Gene-Environment Interactions

Gene-Environment Interactions Gene-Environment Interactions What is gene-environment interaction? A different effect of an environmental exposure on disease risk in persons with different genotypes," or, alternatively, "a different

More information

Using Imputed Genotypes for Relative Risk Estimation in Case-Parent Studies

Using Imputed Genotypes for Relative Risk Estimation in Case-Parent Studies American Journal of Epidemiology Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 011. Vol. 173, No. 5 DOI: 10.1093/aje/kwq363 Advance Access publication:

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis Jie Liu, PhD 1, Yirong Wu, PhD 1, Irene Ong, PhD 1, David Page, PhD 1, Peggy Peissig,

More information

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale. Model Based Statistics in Biology. Part V. The Generalized Linear Model. Single Explanatory Variable on an Ordinal Scale ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10,

More information

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

Numerous hypothesis tests were performed in this study. To reduce the false positive due to Two alternative data-splitting Numerous hypothesis tests were performed in this study. To reduce the false positive due to multiple testing, we are not only seeking the results with extremely small p values

More information

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies

Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Behav Genet (2007) 37:631 636 DOI 17/s10519-007-9149-0 ORIGINAL PAPER Ascertainment Through Family History of Disease Often Decreases the Power of Family-based Association Studies Manuel A. R. Ferreira

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions

Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions XI Biennial Conference of the International Biometric Society (Indian Region) on Computational Statistics and Bio-Sciences, March 8-9, 22 43 Estimation of Area under the ROC Curve Using Exponential and

More information

Inbreeding and Inbreeding Depression

Inbreeding and Inbreeding Depression Inbreeding and Inbreeding Depression Inbreeding is mating among relatives which increases homozygosity Why is Inbreeding a Conservation Concern: Inbreeding may or may not lead to inbreeding depression,

More information

Pedigree Analysis Why do Pedigrees? Goals of Pedigree Analysis Basic Symbols More Symbols Y-Linked Inheritance

Pedigree Analysis Why do Pedigrees? Goals of Pedigree Analysis Basic Symbols More Symbols Y-Linked Inheritance Pedigree Analysis Why do Pedigrees? Punnett squares and chi-square tests work well for organisms that have large numbers of offspring and controlled mating, but humans are quite different: Small families.

More information

Lab 5: Testing Hypotheses about Patterns of Inheritance

Lab 5: Testing Hypotheses about Patterns of Inheritance Lab 5: Testing Hypotheses about Patterns of Inheritance How do we talk about genetic information? Each cell in living organisms contains DNA. DNA is made of nucleotide subunits arranged in very long strands.

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004

Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Transmission Disequilibrium Methods for Family-Based Studies Daniel J. Schaid Technical Report #72 July, 2004 Correspondence to: Daniel J. Schaid, Ph.D., Harwick 775, Division of Biostatistics Mayo Clinic/Foundation,

More information

The Association Design and a Continuous Phenotype

The Association Design and a Continuous Phenotype PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association

More information

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs

Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Am. J. Hum. Genet. 66:567 575, 2000 Effects of Stratification in the Analysis of Affected-Sib-Pair Data: Benefits and Costs Suzanne M. Leal and Jurg Ott Laboratory of Statistical Genetics, The Rockefeller

More information

Association between the XPG gene Asp1104His polymorphism and lung cancer risk

Association between the XPG gene Asp1104His polymorphism and lung cancer risk Association between the XPG gene Asp1104His polymorphism and lung cancer risk B. Zhou, X.M. Hu and G.Y. Wu Department of Thoracic Surgery, Hunan Provincial Tumor Hospital, Changsha, China Corresponding

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

A re-randomisation design for clinical trials

A re-randomisation design for clinical trials Kahan et al. BMC Medical Research Methodology (2015) 15:96 DOI 10.1186/s12874-015-0082-2 RESEARCH ARTICLE Open Access A re-randomisation design for clinical trials Brennan C Kahan 1*, Andrew B Forbes 2,

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Ch 8 Practice Questions

Ch 8 Practice Questions Ch 8 Practice Questions Multiple Choice Identify the choice that best completes the statement or answers the question. 1. What fraction of offspring of the cross Aa Aa is homozygous for the dominant allele?

More information

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package

Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package Power Calculation for Testing If Disease is Associated with Marker in a Case-Control Study Using the GeneticsDesign Package Weiliang Qiu email: weiliang.qiu@gmail.com Ross Lazarus email: ross.lazarus@channing.harvard.edu

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

Statistical questions for statistical methods

Statistical questions for statistical methods Statistical questions for statistical methods Unpaired (two-sample) t-test DECIDE: Does the numerical outcome have a relationship with the categorical explanatory variable? Is the mean of the outcome the

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis

Nonparametric Linkage Analysis. Nonparametric Linkage Analysis Limitations of Parametric Linkage Analysis We previously discued parametric linkage analysis Genetic model for the disease must be specified: allele frequency parameters and penetrance parameters Lod scores

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography

More information

CONTINUOUS AND CATEGORICAL TREND ESTIMATORS: SIMULATION RESULTS AND AN APPLICATION TO RESIDENTIAL RADON

CONTINUOUS AND CATEGORICAL TREND ESTIMATORS: SIMULATION RESULTS AND AN APPLICATION TO RESIDENTIAL RADON CONTINUOUS AND CATEGORICAL TREND ESTIMATORS: SIMULATION RESULTS AND AN APPLICATION TO RESIDENTIAL RADON A Schaffrath Rosario 1,2*, J Wellmann 1,3, IM Heid 1 and HE Wichmann 1,2 1 Institute of Epidemiology,

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

OLS Regression with Clustered Data

OLS Regression with Clustered Data OLS Regression with Clustered Data Analyzing Clustered Data with OLS Regression: The Effect of a Hierarchical Data Structure Daniel M. McNeish University of Maryland, College Park A previous study by Mundfrom

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Association between ERCC1 and ERCC2 gene polymorphisms and susceptibility to pancreatic cancer

Association between ERCC1 and ERCC2 gene polymorphisms and susceptibility to pancreatic cancer Association between ERCC1 and ERCC2 gene polymorphisms and susceptibility to pancreatic cancer M.G. He, K. Zheng, D. Tan and Z.X. Wang Department of Hepatobiliary Surgery, Nuclear Industry 215 Hospital

More information

On Missing Data and Genotyping Errors in Association Studies

On Missing Data and Genotyping Errors in Association Studies On Missing Data and Genotyping Errors in Association Studies Department of Biostatistics Johns Hopkins Bloomberg School of Public Health May 16, 2008 Specific Aims of our R01 1 Develop and evaluate new

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Today Retrospective analysis of binomial response across two levels of a single factor.

Today Retrospective analysis of binomial response across two levels of a single factor. Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.3 Single Factor. Retrospective Analysis ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9,

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Logistic Regression SPSS procedure of LR Interpretation of SPSS output Presenting results from LR Logistic regression is

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Hartwig FP, Borges MC, Lessa Horta B, Bowden J, Davey Smith G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample mendelian randomization study. JAMA Psychiatry.

More information

Assessing publication bias in genetic association studies: evidence from a recent meta-analysis

Assessing publication bias in genetic association studies: evidence from a recent meta-analysis Psychiatry Research 129 (2004) 39 44 www.elsevier.com/locate/psychres Assessing publication bias in genetic association studies: evidence from a recent meta-analysis Marcus R. Munafò a, *, Taane G. Clark

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias

Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias Technical appendix Choice of axis, tests for funnel plot asymmetry, and methods to adjust for publication bias Choice of axis in funnel plots Funnel plots were first used in educational research and psychology,

More information

Lab Activity Report: Mendelian Genetics - Genetic Disorders

Lab Activity Report: Mendelian Genetics - Genetic Disorders Name Date Period Lab Activity Report: Mendelian Genetics - Genetic Disorders Background: Sometimes genetic disorders are caused by mutations to normal genes. When the mutation has been in the population

More information

HRAS1 Rare Minisatellite Alleles and Breast Cancer in Australian Women Under Age Forty Years

HRAS1 Rare Minisatellite Alleles and Breast Cancer in Australian Women Under Age Forty Years HRAS1 Rare Minisatellite Alleles and Breast Cancer in Australian Women Under Age Forty Years Frank A. Firgaira, Ram Seshadri, Christopher R. E. McEvoy, Gillian S. Dite, Graham G. Giles, Margaret R. E.

More information

Performance of the Trim and Fill Method in Adjusting for the Publication Bias in Meta-Analysis of Continuous Data

Performance of the Trim and Fill Method in Adjusting for the Publication Bias in Meta-Analysis of Continuous Data American Journal of Applied Sciences 9 (9): 1512-1517, 2012 ISSN 1546-9239 2012 Science Publication Performance of the Trim and Fill Method in Adjusting for the Publication Bias in Meta-Analysis of Continuous

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification RESEARCH HIGHLIGHT Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification Yong Zang 1, Beibei Guo 2 1 Department of Mathematical

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Gregor Mendel and Genetics Worksheets

Gregor Mendel and Genetics Worksheets Gregor Mendel and Genetics Worksheets Douglas Wilkin, Ph.D. (DWilkin) Say Thanks to the Authors Click http://www.ck12.org/saythanks (No sign in required) To access a customizable version of this book,

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018 An Introduction to Quantitative Genetics I Heather A Lawson Advanced Genetics Spring2018 Outline What is Quantitative Genetics? Genotypic Values and Genetic Effects Heritability Linkage Disequilibrium

More information

Model of an F 1 and F 2 generation

Model of an F 1 and F 2 generation Mendelian Genetics Casual observation of a population of organisms (e.g. cats) will show variation in many visible characteristics (e.g. color of fur). While members of a species will have the same number

More information

Systems of Mating: Systems of Mating:

Systems of Mating: Systems of Mating: 8/29/2 Systems of Mating: the rules by which pairs of gametes are chosen from the local gene pool to be united in a zygote with respect to a particular locus or genetic system. Systems of Mating: A deme

More information

Cancer develops after somatic mutations overcome the multiple

Cancer develops after somatic mutations overcome the multiple Genetic variation in cancer predisposition: Mutational decay of a robust genetic control network Steven A. Frank* Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697-2525

More information

ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY

ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY by SARADHA RAJAMANI A Project submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of

More information

Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes

Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes Joanne Katz,* Scott Zeger,-\ and Kung-Yee Liangf Purpose. Many ocular measurements are more alike between

More information

HARDY- WEINBERG PRACTICE PROBLEMS

HARDY- WEINBERG PRACTICE PROBLEMS HARDY- WEINBERG PRACTICE PROBLEMS PROBLEMS TO SOLVE: 1. The proportion of homozygous recessives of a certain population is 0.09. If we assume that the gene pool is large and at equilibrium and all genotypes

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha attrition: When data are missing because we are unable to measure the outcomes of some of the

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

Econometric Game 2012: infants birthweight?

Econometric Game 2012: infants birthweight? Econometric Game 2012: How does maternal smoking during pregnancy affect infants birthweight? Case A April 18, 2012 1 Introduction Low birthweight is associated with adverse health related and economic

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Bias in randomised factorial trials

Bias in randomised factorial trials Research Article Received 17 September 2012, Accepted 9 May 2013 Published online 4 June 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5869 Bias in randomised factorial trials

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Sensitivity Analysis in Observational Research: Introducing the E-value

Sensitivity Analysis in Observational Research: Introducing the E-value Sensitivity Analysis in Observational Research: Introducing the E-value Tyler J. VanderWeele Harvard T.H. Chan School of Public Health Departments of Epidemiology and Biostatistics 1 Plan of Presentation

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Use of allele scores as instrumental variables for Mendelian randomization

Use of allele scores as instrumental variables for Mendelian randomization Use of allele scores as instrumental variables for Mendelian randomization Stephen Burgess Simon G. Thompson October 5, 2012 Summary Background: An allele score is a single variable summarizing multiple

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information