Ridge regression for risk prediction

Size: px
Start display at page:

Download "Ridge regression for risk prediction"

Transcription

1 Ridge regression for risk prediction with applications to genetic data Erika Cule and Maria De Iorio Imperial College London Department of Epidemiology and Biostatistics School of Public Health May 2012

2 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

3 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

4 Risk Prediction using Genetic Data In the decade following the publication of the first draft of the Human Genome Sequence...

5 Risk Prediction using Genetic Data In the decade following the publication of the first draft of the Human Genome Sequence......genome-wide association studies have identified thousands of genetic variants associated with hundreds of diseases and traits.

6 Risk Prediction using Genetic Data However, clinicians are getting impatient about the utility of these identified variants for risk prediction in complex diseases:

7 Risk Prediction using Genetic Data However, clinicians are getting impatient about the utility of these identified variants for risk prediction in complex diseases:

8 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009).

9 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way

10 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way The framework that we propose allows for the simultaneous inclusion of all predictors genome-wide in a regression model.

11 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way The framework that we propose allows for the simultaneous inclusion of all predictors genome-wide in a regression model. Our approach is appropriate where there are many predictors of small effect size,which is thought to be the case in genetic data.

12 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

13 5 0 Univariate 15 tests of association Type 1 diabetes X X 15 Type 2 diabetes X Chromosome enome-wide scan for seven diseases. For each of seven diseases e trend test P value for quality-control-positive SNPs, excluding h disease that were excluded for having poor clustering after tion, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values, highlighted in green. All panels are truncated at 2log 10 (P value) 5 15, although some markers (for example, in the M T1D and RA) exceed this significance threshold Nature Publishing Group

14 5 0 Univariate 15 tests of association Type 1 diabetes X X 15 Type 2 diabetes X Chromosome enome-wide scan for seven diseases. For each of seven diseases e trend test P value for quality-control-positive SNPs, excluding h disease that were excluded for having poor clustering after tion, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values, highlighted in green. All panels are truncated at 2log 10 (P value) 5 15, although some markers (for example, in the M T1D and RA) exceed this significance threshold Nature Publishing Group

15 X X X X chotic features5 (delusions and hallucinations) often occur. recognizing that this signal was not additionally supported sis is poorly understood but there is robust evidence for a expanded reference group analysis (see below and Supplem 0 l genetic contribution to risk27,28. The estimated sibling Table 9) and that independent replication is essential, we no 27,28 risk (ls) is The definition 15 and heritability 80 90% Type 1 several diabetesgenes at this locus could have pathological relevance (Fig. 5). These include PALB2 (partner and localizer of BR notype is based 10 solely on clinical features because, as yet, which is involved in stability of key nuclear structures inc lacks validating 5 diagnostic tests such as those available for sical illnesses. 0Indeed, a major goal of molecular genetics chromatin and the nuclear matrix; NDUFAB1 (NADH dehyd s to psychiatric illness is an improvement in diagnostic ase (ubiquinone) 1, alpha/beta subcomplex, 1), which enc Type 2 subunit diabetes of complex I of the mitochondrial respiratory chai on that will follow identification of the biological systems 15 DCTN5 (dynactin 5), which encodes a protein involved in in pin the clinical 10 syndromes. The phenotype definition that sed includes individuals that have suffered one or more lular transport that is known to interact with the gene disrup 5 schizophrenia 1 (DISC1)32, the latter having been implicated f pathologically 0 elevated mood (see Methods), a criterion res the clinical spectrum of bipolar mood variation that ceptibility to bipolar disorder as well as schizophrenia33. ilial aggregation29. Of the four regions showing association at P, Chromosome expanded reference group analysis (Supplementary Table 9), genomic regions have been implicated in linkage studies30 Genome-wide scan for seven diseases. For each of seven diseases Chromosomes are shown in alternating colours for clarity, with interest that geneintogreen. the signal at rs (P at 5 tly,trend replicated evidence implicating specific SNPs, genes excluding has been he test P value for quality-control-positive highlighted All panels are truncated P values,1 3 the 1025closest ncreasing evidence suggests an overlap in genetic suscept) (P is KCNC2 which encodes the Shaw-related voltage-gate value) 5 15, although some markers (for example, in the M ch disease that were excluded for having poor clustering after 2log10 hction, schizophrenia, a psychotic disorder many similarassium Ionthis channelopathies are well-recognized as ca T1D andchannel. RA) exceed significance threshold. are plotted against position on eachwith chromosome.. In particular association findings have been reported with episodic central nervous system disease, including seizures, Univariate tests of association Nature Publishing Group Bipolar disorder Coronary artery disease X WTCCC (2007) Crohn s disease

16 Multivariate methods Consider all SNPs jointly

17 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data.

18 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero.

19 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero. A number of penalized regression approaches have been proposed in the literature: Lasso regression, HyperLasso, Elastic Net...

20 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero. A number of penalized regression approaches have been proposed in the literature: Lasso regression, HyperLasso, Elastic Net... Ridge Regression

21 Prior distributions in Lasso and Ridge Regression

22 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

23 Ridge regression Ridge regression (Hoerl & Kennard, 1970) is a penalized regression approach proposed to overcome the problems associated with multicollinearity among predictors in multiple regression. Among penalized regression approaches, ridge regression has been shown to offer very good predictive performance (Frank & Friedman, 1993). We applied ridge regression to the problem of risk prediction using genetic data obtained from genome-wide association studies. Ridge regression shrinks the squared length of the regression coefficient vector - corresponds to a quadratic penalty on the coefficients.

24 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

25 Shrinkage parameter Controls the degree of shrinkage of the regression coefficients. A larger shrinkage parameter shrinks the coefficients further towards zero. Data-driven methods proposed in the literature cannot be applied p n, because they depend on the ordinary least squares estimates.

26 Shrinkage parameter Controls the degree of shrinkage of the regression coefficients. A larger shrinkage parameter shrinks the coefficients further towards zero. Data-driven methods proposed in the literature cannot be applied p n, because they depend on the ordinary least squares estimates. Ridge trace (graphical method)

27 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 )

28 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1

29 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1 Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ

30 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1 Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ ˆσ 2, ˆβ estimated from ordinary least squares (OLS).

31 We observed Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ

32 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ

33 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα

34 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα ˆα are principal components regression coefficients.

35 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα ˆα are principal components regression coefficients. PCR coefficients are available when p >> n

36 We propose k HKB = pˆσ2 ˆα ˆα Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates.

37 We propose k HKB = pˆσ2 ˆα ˆα k r = r ˆσ2 r ˆα r ˆα r Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates.

38 We propose k HKB = pˆσ2 ˆα ˆα k r = r ˆσ2 r ˆα r ˆα r Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates. How many components?

39 How many components? % of replicates with larger MSE using k HKB than using k r percent signal 49 to noise ratio number of PCs (r) 20 0

40 How many components? Most of the variance in genetic data can be explained by the first few principal components.

41 How many components? PSE = { 1 + tr(hh ) n } σ 2 + b b n = variance + bias2 n H is the hat matrix : Ŷ = HY Degrees of freedom for variance = tr(hh ) (Hastie & Tibshirani (1990) ).

42 How many components? For given r, RR estimates have less bias than PCR estimates.

43 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance.

44 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance. We fixed r such that degrees of freedom of the ridge model using r components equals r.

45 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance. We fixed r such that degrees of freedom of the ridge model using r components equals r. tr ( HH ) = r

46 Simulation Study Mean prediction squared error:

47 Simulation Study Mean prediction squared error: p-value trace:

48 Simulation study Performance comparison SNP ranking followed by multivariate regression HyperLasso Continuous and binary outcomes Univariate HLasso RR % of SNPs ranked by univariate p-value 0.1% 0.5% 1 % 3% 4% Continuous outcomes (mean PSE) Binary outcomes (mean CE)

49 Bipolar Disorder Data Two GWAS of Bipolar Disorder: WTCCC and GAIN. Case-control studies - model extended to logistic ridge regression. SNPs typed on different platforms. Impute2 to obtain common SNPs. When determining shrinkage parameter, training data were thinned (1 SNP every 100kb). Univariate model - which significance threshold? HyperLasso - cross-validation to choose the parameters is computationally intensive.

50 Bipolar Disorder Data Two GWAS of Bipolar Disorder: WTCCC and GAIN. Case-control studies - model extended to logistic ridge regression. SNPs typed on different platforms. Impute2 to obtain common SNPs. When determining shrinkage parameter, training data were thinned (1 SNP every 100kb). Univariate model - which significance threshold? HyperLasso - cross-validation to choose the parameters is computationally intensive. Univariate HyperLasso Ridge Regression p-value threshold Mean Classification Error

51 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

52 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero.

53 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero. A test of significance of ridge regression coefficients had been proposed (Halawa & El Bassiouni, 2000) and applied (Malo et al, 2008) but not evaluated.

54 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero. A test of significance of ridge regression coefficients had been proposed (Halawa & El Bassiouni, 2000) and applied (Malo et al, 2008) but not evaluated. We extended the test to be applicable when p >> n and to be applied in logistic ridge regression, and evaluated its performance on simulated and real data sets.

55 Significance test Based on a Wald test: T k = ˆβ ( k ) H 0 : T k N (0, 1) se ˆβ k ( ) se ˆβ k from covariance matrix ( ) Var ˆβ k = ˆσ 2 (X X + ki) 1 X X(X X + ki) 1 taking into account both correlation in predictors and amount of shrinkage.

56 Simulation study Causal SNP Simulation study Frequency p = 0!1.0! coefficient estimate p = Probability p = 1.07e!08! T! p = Si u p te p w To

57 Lung Cancer Data Freque Simulation study Non-causal SNP Frequency !1.0! coefficient estimate p = Probab Probability ! T! p = pe te pe w To w no!1.0! ! coefficient estimate T!

58 Simulation study True-positive and False-positive rates Individuals SNPs ALL ALL Shrinkage Parameter Approximate test Permutation test TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR

59 Lung Cancer Data Approximate test Permutation test a b!log p!value rs rs rs other SNPs!log p!value Inf rs rs rs other SNPs Shrinkage parameter Shrinkage parameter

60 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions

61 Summary Prediction is a challenging problem!

62 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction.

63 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n.

64 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n. We introduced a method for testing the significance of regression coefficients estimated using ridge regression.

65 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n. We introduced a method for testing the significance of regression coefficients estimated using ridge regression. We have enabled ridge regression to be a feasible tool for genetic risk prediction on a genome-wide scale.

66 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges.

67 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models.

68 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface).

69 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations.

70 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not.

71 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented.

72 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented. Graphical outputs - ridge and p-value traces.

73 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented. Graphical outputs - ridge and p-value traces. Option for user-specified shrinkage parameter, with our semi-automatic method as the default.

74 Acknowledgements Maria De Iorio Colleagues in the Department of Epidemiology and Biostatistics, Imperial College London Colleagues in the Department of Statistical Science, University College London ILCO study nested within EPIC WTCCC and GAIN studies

75 References [1] D. G Clayton. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genetics, [2] Erika Cule and Maria De Iorio. A semi-automatic method to guide the choice of ridge parameter in ridge regression. arxiv, stat.ap, May [3] Erika Cule, Paolo Vineis, and Maria De Iorio. Significance testing in ridge regression for genetic data. BMC Bioinformatics, 12(1):372, [4] Ildiko Frank and Jerome Friedman. A statistical view of some chemometrics regression tools. Technometrics, 35(2): , May [5] A M Halawa and M Y El Bassiouni. Tests of regression coefficients under ridge regression models. Journal of Statistical Computation and Simulation, 65(1): , [6] T Hastie and R Tibshirani. Generalized Additive Models. Chapman & Hall, [7] Arthur E Hoerl and RW Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55 67, [8] Clive J Hoggart, JC Whittaker, M De Iorio, and David J Balding. Simultaneous analysis of all snps in genome-wide and re-sequencing association studies. PLoS Genet, 4(7):e , [9] Nathalie Malo, Ondrej Libiger, and Nicholas J Schork. Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet, 82(2): , Feb [10] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58: , [11] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2): , Jan 2005.

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA by Melissa L. Ziegler B.S. Mathematics, Elizabethtown College, 2000 M.A. Statistics, University of Pittsburgh, 2002 Submitted to the Graduate Faculty

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018 HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo

More information

Applying Machine Learning Methods in Medical Research Studies

Applying Machine Learning Methods in Medical Research Studies Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk

More information

Structured Association Advanced Topics in Computa8onal Genomics

Structured Association Advanced Topics in Computa8onal Genomics Structured Association 02-715 Advanced Topics in Computa8onal Genomics Structured Association Lasso ACGTTTTACTGTACAATT Gflasso (Kim & Xing, 2009) ACGTTTTACTGTACAATT Greater power Fewer false posi2ves Phenome

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis

Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Gerta Rücker 1, James Carpenter 12, Guido Schwarzer 1 1 Institute of Medical Biometry and Medical Informatics, University

More information

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University High dimensional multivariate data, where the number of variables

More information

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression

More information

What is Regularization? Example by Sean Owen

What is Regularization? Example by Sean Owen What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large

More information

Multivariate dose-response meta-analysis: an update on glst

Multivariate dose-response meta-analysis: an update on glst Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

A Comparative Study of Some Estimation Methods for Multicollinear Data

A Comparative Study of Some Estimation Methods for Multicollinear Data International Journal of Engineering and Applied Sciences (IJEAS) A Comparative Study of Some Estimation Methods for Multicollinear Okeke Evelyn Nkiruka, Okeke Joseph Uchenna Abstract This article compares

More information

ARTICLE Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression

ARTICLE Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression ARTICLE Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression Nathalie Malo, 1,2 Ondrej Libiger, 1,2 and Nicholas J. Schork 1,2, * Large-scale genetic-association studies

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

Genetic association analysis incorporating intermediate phenotypes information for complex diseases

Genetic association analysis incorporating intermediate phenotypes information for complex diseases University of Iowa Iowa Research Online Theses and Dissertations Fall 2011 Genetic association analysis incorporating intermediate phenotypes information for complex diseases Yafang Li University of Iowa

More information

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666 Quantitative Trait Analsis in Sibling Pairs Biostatistics 666 Outline Likelihood function for bivariate data Incorporate genetic kinship coefficients Incorporate IBD probabilities The data Pairs of measurements

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

Comparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman

Comparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman Comparing heritability estimates for twin studies + : & Mary Ellen Koran Tricia Thornton-Wells Bennett Landman January 20, 2014 Outline Motivation Software for performing heritability analysis Simulations

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Variable selection should be blinded to the outcome

Variable selection should be blinded to the outcome Variable selection should be blinded to the outcome Tamás Ferenci Manuscript type: Letter to the Editor Title: Variable selection should be blinded to the outcome Author List: Tamás Ferenci * (Physiological

More information

Developing and evaluating polygenic risk prediction models for stratified disease prevention

Developing and evaluating polygenic risk prediction models for stratified disease prevention Developing and evaluating polygenic risk prediction models for stratified disease prevention Nilanjan Chatterjee 1 3, Jianxin Shi 3 and Montserrat García-Closas 3 Abstract Knowledge of genetics and its

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

RISK PREDICTION MODEL: PENALIZED REGRESSIONS

RISK PREDICTION MODEL: PENALIZED REGRESSIONS RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann,

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

CNV PCA Search Tutorial

CNV PCA Search Tutorial CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only

More information

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics) Example 7.2 Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Example 7.2. Autocorrelation 1 / 17 Questions.

More information

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis

More information

Mendelian Randomization

Mendelian Randomization Mendelian Randomization Drawback with observational studies Risk factor X Y Outcome Risk factor X? Y Outcome C (Unobserved) Confounders The power of genetics Intermediate phenotype (risk factor) Genetic

More information

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009 Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 1/3 Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences Lecture 14: Adjusting

More information

Impact of Response Variability on Pareto Front Optimization

Impact of Response Variability on Pareto Front Optimization Impact of Response Variability on Pareto Front Optimization Jessica L. Chapman, 1 Lu Lu 2 and Christine M. Anderson-Cook 3 1 Department of Mathematics, Computer Science, and Statistics, St. Lawrence University,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016 Heritability enrichment of differentially expressed genes Hilary Finucane PGC Statistical Analysis Call January 26, 2016 1 Functional genomics + GWAS gives insight into disease relevant tissues Trynka

More information

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients Muhammad Hasan Sidiq Kurniawan 1) 1)* Department of Statistics, Universitas Islam Indonesia hasansidiq@uiiacid

More information

Outline. Introduction GitHub and installation Worked example Stata wishes Discussion. mrrobust: a Stata package for MR-Egger regression type analyses

Outline. Introduction GitHub and installation Worked example Stata wishes Discussion. mrrobust: a Stata package for MR-Egger regression type analyses mrrobust: a Stata package for MR-Egger regression type analyses London Stata User Group Meeting 2017 8 th September 2017 Tom Palmer Wesley Spiller Neil Davies Outline Introduction GitHub and installation

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012 Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology

More information

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control

More information

Rare Variant Burden Tests. Biostatistics 666

Rare Variant Burden Tests. Biostatistics 666 Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls

More information

MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS

MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS Paul F. O Reilly 1 *, Clive J. Hoggart 2, Yotsawat Pomyen 3,4, Federico C. F. Calboli 1, Paul Elliott 1,5, Marjo- Riitta Jarvelin

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Hartwig FP, Borges MC, Lessa Horta B, Bowden J, Davey Smith G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample mendelian randomization study. JAMA Psychiatry.

More information

White Paper Guidelines on Vetting Genetic Associations

White Paper Guidelines on Vetting Genetic Associations White Paper 23-03 Guidelines on Vetting Genetic Associations Authors: Andro Hsu Brian Naughton Shirley Wu Created: November 14, 2007 Revised: February 14, 2008 Revised: June 10, 2010 (see end of document

More information

Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves

Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves Research Article Received 8 June 2010, Accepted 15 February 2011 Published online 15 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4246 Assessment of a disease screener by

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Graphical Modeling Approaches for Estimating Brain Networks

Graphical Modeling Approaches for Estimating Brain Networks Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology

Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology Billy Heung Wing Chang, Yanxian Chen, Mingguang He Zhongshan Ophthalmic Center, Sun Yat-sen University Biostatistics

More information

The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates

The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates Ariana Anderson University of California, Los Angeles Departments of Psychiatry and Behavioral Sciences David Geffen School

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013

Introduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013 Introduction of Genome wide Complex Trait Analysis (GCTA) resenter: ue Ming Chen Location: Stat Gen Workshop Date: 6/7/013 Outline Brief review of quantitative genetics Overview of GCTA Ideas Main functions

More information

Refining multivariate disease phenotypes for high chip heritability

Refining multivariate disease phenotypes for high chip heritability Sun et al. RESEARCH Refining multivariate disease phenotypes for high chip heritability Jiangwen Sun 1, Henry R. Kranzler 2 and Jinbo Bi 1* * Correspondence: jinbo@engr.uconn.edu 1 Department of Computer

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Inference with Difference-in-Differences Revisited

Inference with Difference-in-Differences Revisited Inference with Difference-in-Differences Revisited M. Brewer, T- F. Crossley and R. Joyce Journal of Econometric Methods, 2018 presented by Federico Curci February 22nd, 2018 Brewer, Crossley and Joyce

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Title: Pinpointing resilience in Bipolar Disorder

Title: Pinpointing resilience in Bipolar Disorder Title: Pinpointing resilience in Bipolar Disorder 1. AIM OF THE RESEARCH AND BRIEF BACKGROUND Bipolar disorder (BD) is a mood disorder characterised by episodes of depression and mania. It ranks as one

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

ISIR: Independent Sliced Inverse Regression

ISIR: Independent Sliced Inverse Regression ISIR: Independent Sliced Inverse Regression Kevin B. Li Beijing Jiaotong University Abstract In this paper we consider a semiparametric regression model involving a p-dimensional explanatory variable x

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Analyzing diastolic and systolic blood pressure individually or jointly?

Analyzing diastolic and systolic blood pressure individually or jointly? Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster

More information

Linear Regression Analysis

Linear Regression Analysis Linear Regression Analysis WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Peter Bloomfield, Noel A. C. Cressie, Nicholas I.

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

The Late Pretest Problem in Randomized Control Trials of Education Interventions

The Late Pretest Problem in Randomized Control Trials of Education Interventions The Late Pretest Problem in Randomized Control Trials of Education Interventions Peter Z. Schochet ACF Methods Conference, September 2012 In Journal of Educational and Behavioral Statistics, August 2010,

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:

More information

Sensitivity, specicity, ROC

Sensitivity, specicity, ROC Sensitivity, specicity, ROC Thomas Alexander Gerds Department of Biostatistics, University of Copenhagen 1 / 53 Epilog: disease prevalence The prevalence is the proportion of cases in the population today.

More information

arxiv: v2 [stat.ap] 7 Dec 2016

arxiv: v2 [stat.ap] 7 Dec 2016 A Bayesian Approach to Predicting Disengaged Youth arxiv:62.52v2 [stat.ap] 7 Dec 26 David Kohn New South Wales 26 david.kohn@sydney.edu.au Nick Glozier Brain Mind Centre New South Wales 26 Sally Cripps

More information

Does Machine Learning. In a Learning Health System?

Does Machine Learning. In a Learning Health System? Does Machine Learning Have a Place In a Learning Health System? Grand Rounds: Rethinking Clinical Research Friday, December 15, 2017 Michael J. Pencina, PhD Professor of Biostatistics and Bioinformatics,

More information

MODEL SELECTION STRATEGIES. Tony Panzarella

MODEL SELECTION STRATEGIES. Tony Panzarella MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Identification of Neuroimaging Biomarkers

Identification of Neuroimaging Biomarkers Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to

More information

Accurate Liability Estimation Substantially Improves Power in Ascertained Case. Running Title: Liability Estimation Improves Case Control GWAS

Accurate Liability Estimation Substantially Improves Power in Ascertained Case. Running Title: Liability Estimation Improves Case Control GWAS Accurate Liability Estimation Substantially Improves Power in Ascertained Case Control Studies Omer Weissbrod 1,*, Christoph Lippert 2, Dan Geiger 1 and David Heckerman 2,** 1 Computer Science Department,

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Introduction to Genetics and Genomics

Introduction to Genetics and Genomics 2016 Introduction to enetics and enomics 3. ssociation Studies ggibson.gt@gmail.com http://www.cig.gatech.edu Outline eneral overview of association studies Sample results hree steps to WS: primary scan,

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.

Cross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. August 25, 2015 Cancer Survival Group (LSH&TM) Cross-validation August

More information