Ridge regression for risk prediction
|
|
- Norma Arline Beasley
- 5 years ago
- Views:
Transcription
1 Ridge regression for risk prediction with applications to genetic data Erika Cule and Maria De Iorio Imperial College London Department of Epidemiology and Biostatistics School of Public Health May 2012
2 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
3 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
4 Risk Prediction using Genetic Data In the decade following the publication of the first draft of the Human Genome Sequence...
5 Risk Prediction using Genetic Data In the decade following the publication of the first draft of the Human Genome Sequence......genome-wide association studies have identified thousands of genetic variants associated with hundreds of diseases and traits.
6 Risk Prediction using Genetic Data However, clinicians are getting impatient about the utility of these identified variants for risk prediction in complex diseases:
7 Risk Prediction using Genetic Data However, clinicians are getting impatient about the utility of these identified variants for risk prediction in complex diseases:
8 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009).
9 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way
10 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way The framework that we propose allows for the simultaneous inclusion of all predictors genome-wide in a regression model.
11 Risk prediction using genetic data Recently, questions have been raised about the potential utility of genetic risk prediction for complex diseases (Clayton, 2009). The aim here is to make ridge regression possible for genetic data in a semi-automatic way The framework that we propose allows for the simultaneous inclusion of all predictors genome-wide in a regression model. Our approach is appropriate where there are many predictors of small effect size,which is thought to be the case in genetic data.
12 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
13 5 0 Univariate 15 tests of association Type 1 diabetes X X 15 Type 2 diabetes X Chromosome enome-wide scan for seven diseases. For each of seven diseases e trend test P value for quality-control-positive SNPs, excluding h disease that were excluded for having poor clustering after tion, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values, highlighted in green. All panels are truncated at 2log 10 (P value) 5 15, although some markers (for example, in the M T1D and RA) exceed this significance threshold Nature Publishing Group
14 5 0 Univariate 15 tests of association Type 1 diabetes X X 15 Type 2 diabetes X Chromosome enome-wide scan for seven diseases. For each of seven diseases e trend test P value for quality-control-positive SNPs, excluding h disease that were excluded for having poor clustering after tion, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values, highlighted in green. All panels are truncated at 2log 10 (P value) 5 15, although some markers (for example, in the M T1D and RA) exceed this significance threshold Nature Publishing Group
15 X X X X chotic features5 (delusions and hallucinations) often occur. recognizing that this signal was not additionally supported sis is poorly understood but there is robust evidence for a expanded reference group analysis (see below and Supplem 0 l genetic contribution to risk27,28. The estimated sibling Table 9) and that independent replication is essential, we no 27,28 risk (ls) is The definition 15 and heritability 80 90% Type 1 several diabetesgenes at this locus could have pathological relevance (Fig. 5). These include PALB2 (partner and localizer of BR notype is based 10 solely on clinical features because, as yet, which is involved in stability of key nuclear structures inc lacks validating 5 diagnostic tests such as those available for sical illnesses. 0Indeed, a major goal of molecular genetics chromatin and the nuclear matrix; NDUFAB1 (NADH dehyd s to psychiatric illness is an improvement in diagnostic ase (ubiquinone) 1, alpha/beta subcomplex, 1), which enc Type 2 subunit diabetes of complex I of the mitochondrial respiratory chai on that will follow identification of the biological systems 15 DCTN5 (dynactin 5), which encodes a protein involved in in pin the clinical 10 syndromes. The phenotype definition that sed includes individuals that have suffered one or more lular transport that is known to interact with the gene disrup 5 schizophrenia 1 (DISC1)32, the latter having been implicated f pathologically 0 elevated mood (see Methods), a criterion res the clinical spectrum of bipolar mood variation that ceptibility to bipolar disorder as well as schizophrenia33. ilial aggregation29. Of the four regions showing association at P, Chromosome expanded reference group analysis (Supplementary Table 9), genomic regions have been implicated in linkage studies30 Genome-wide scan for seven diseases. For each of seven diseases Chromosomes are shown in alternating colours for clarity, with interest that geneintogreen. the signal at rs (P at 5 tly,trend replicated evidence implicating specific SNPs, genes excluding has been he test P value for quality-control-positive highlighted All panels are truncated P values,1 3 the 1025closest ncreasing evidence suggests an overlap in genetic suscept) (P is KCNC2 which encodes the Shaw-related voltage-gate value) 5 15, although some markers (for example, in the M ch disease that were excluded for having poor clustering after 2log10 hction, schizophrenia, a psychotic disorder many similarassium Ionthis channelopathies are well-recognized as ca T1D andchannel. RA) exceed significance threshold. are plotted against position on eachwith chromosome.. In particular association findings have been reported with episodic central nervous system disease, including seizures, Univariate tests of association Nature Publishing Group Bipolar disorder Coronary artery disease X WTCCC (2007) Crohn s disease
16 Multivariate methods Consider all SNPs jointly
17 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data.
18 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero.
19 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero. A number of penalized regression approaches have been proposed in the literature: Lasso regression, HyperLasso, Elastic Net...
20 Multivariate methods Consider all SNPs jointly Standard multivariate methods cannot be used with modern genetic data sets which have p n. Typically, additional (non-genetic) covariates are included in the analysis, further increasing the dimensionality of the data. Penalized regression methods constrain the size of the maximum likelihood estimates of regression coefficients. Known as shrinkage methods - shrink regression coefficients towards zero. A number of penalized regression approaches have been proposed in the literature: Lasso regression, HyperLasso, Elastic Net... Ridge Regression
21 Prior distributions in Lasso and Ridge Regression
22 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
23 Ridge regression Ridge regression (Hoerl & Kennard, 1970) is a penalized regression approach proposed to overcome the problems associated with multicollinearity among predictors in multiple regression. Among penalized regression approaches, ridge regression has been shown to offer very good predictive performance (Frank & Friedman, 1993). We applied ridge regression to the problem of risk prediction using genetic data obtained from genome-wide association studies. Ridge regression shrinks the squared length of the regression coefficient vector - corresponds to a quadratic penalty on the coefficients.
24 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
25 Shrinkage parameter Controls the degree of shrinkage of the regression coefficients. A larger shrinkage parameter shrinks the coefficients further towards zero. Data-driven methods proposed in the literature cannot be applied p n, because they depend on the ordinary least squares estimates.
26 Shrinkage parameter Controls the degree of shrinkage of the regression coefficients. A larger shrinkage parameter shrinks the coefficients further towards zero. Data-driven methods proposed in the literature cannot be applied p n, because they depend on the ordinary least squares estimates. Ridge trace (graphical method)
27 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 )
28 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1
29 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1 Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ
30 Our starting point Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Ridge regression: ˆβ k = arg min β n y i i=1 p β i x ij j=1 2 + k p βj 2 j=1 Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ ˆσ 2, ˆβ estimated from ordinary least squares (OLS).
31 We observed Linear model: Y = Xβ + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ
32 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 ˆβ ˆβ
33 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα
34 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα ˆα are principal components regression coefficients.
35 We observed Linear model: Y = Xβ + ɛ = Z α + ɛ ɛ iid N(0, σ 2 ) Proposed by Hoerl, Kennard & Baldwin (1975): k HKB = pˆσ2 pˆσ2 = ˆβ ˆβ ˆα ˆα ˆα are principal components regression coefficients. PCR coefficients are available when p >> n
36 We propose k HKB = pˆσ2 ˆα ˆα Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates.
37 We propose k HKB = pˆσ2 ˆα ˆα k r = r ˆσ2 r ˆα r ˆα r Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates.
38 We propose k HKB = pˆσ2 ˆα ˆα k r = r ˆσ2 r ˆα r ˆα r Harmonic mean of the ideal shrinkage parameters of the PCR coefficients, with coefficients replaced by their ordinary least squares estimates. How many components?
39 How many components? % of replicates with larger MSE using k HKB than using k r percent signal 49 to noise ratio number of PCs (r) 20 0
40 How many components? Most of the variance in genetic data can be explained by the first few principal components.
41 How many components? PSE = { 1 + tr(hh ) n } σ 2 + b b n = variance + bias2 n H is the hat matrix : Ŷ = HY Degrees of freedom for variance = tr(hh ) (Hastie & Tibshirani (1990) ).
42 How many components? For given r, RR estimates have less bias than PCR estimates.
43 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance.
44 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance. We fixed r such that degrees of freedom of the ridge model using r components equals r.
45 How many components? For given r, RR estimates have less bias than PCR estimates. PCR using r components has r degrees of freedom for variance. We fixed r such that degrees of freedom of the ridge model using r components equals r. tr ( HH ) = r
46 Simulation Study Mean prediction squared error:
47 Simulation Study Mean prediction squared error: p-value trace:
48 Simulation study Performance comparison SNP ranking followed by multivariate regression HyperLasso Continuous and binary outcomes Univariate HLasso RR % of SNPs ranked by univariate p-value 0.1% 0.5% 1 % 3% 4% Continuous outcomes (mean PSE) Binary outcomes (mean CE)
49 Bipolar Disorder Data Two GWAS of Bipolar Disorder: WTCCC and GAIN. Case-control studies - model extended to logistic ridge regression. SNPs typed on different platforms. Impute2 to obtain common SNPs. When determining shrinkage parameter, training data were thinned (1 SNP every 100kb). Univariate model - which significance threshold? HyperLasso - cross-validation to choose the parameters is computationally intensive.
50 Bipolar Disorder Data Two GWAS of Bipolar Disorder: WTCCC and GAIN. Case-control studies - model extended to logistic ridge regression. SNPs typed on different platforms. Impute2 to obtain common SNPs. When determining shrinkage parameter, training data were thinned (1 SNP every 100kb). Univariate model - which significance threshold? HyperLasso - cross-validation to choose the parameters is computationally intensive. Univariate HyperLasso Ridge Regression p-value threshold Mean Classification Error
51 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
52 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero.
53 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero. A test of significance of ridge regression coefficients had been proposed (Halawa & El Bassiouni, 2000) and applied (Malo et al, 2008) but not evaluated.
54 Significance testing in ridge regression Ridge regression is not a variable selection method - the shrinkage penalty does not shrink any coefficient estimates to zero. A test of significance of ridge regression coefficients had been proposed (Halawa & El Bassiouni, 2000) and applied (Malo et al, 2008) but not evaluated. We extended the test to be applicable when p >> n and to be applied in logistic ridge regression, and evaluated its performance on simulated and real data sets.
55 Significance test Based on a Wald test: T k = ˆβ ( k ) H 0 : T k N (0, 1) se ˆβ k ( ) se ˆβ k from covariance matrix ( ) Var ˆβ k = ˆσ 2 (X X + ki) 1 X X(X X + ki) 1 taking into account both correlation in predictors and amount of shrinkage.
56 Simulation study Causal SNP Simulation study Frequency p = 0!1.0! coefficient estimate p = Probability p = 1.07e!08! T! p = Si u p te p w To
57 Lung Cancer Data Freque Simulation study Non-causal SNP Frequency !1.0! coefficient estimate p = Probab Probability ! T! p = pe te pe w To w no!1.0! ! coefficient estimate T!
58 Simulation study True-positive and False-positive rates Individuals SNPs ALL ALL Shrinkage Parameter Approximate test Permutation test TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR
59 Lung Cancer Data Approximate test Permutation test a b!log p!value rs rs rs other SNPs!log p!value Inf rs rs rs other SNPs Shrinkage parameter Shrinkage parameter
60 Outline 1 Risk Prediction using Genetic Data 2 Methods and challenges 3 Ridge Regression Shrinkage parameter Significance testing 4 Conclusions
61 Summary Prediction is a challenging problem!
62 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction.
63 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n.
64 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n. We introduced a method for testing the significance of regression coefficients estimated using ridge regression.
65 Summary Prediction is a challenging problem! Ridge regression is a popular penalized regression approach that has been shown to perform well for prediction. We propose a semi-automatic method for choosing the shrinkage parameter in ridge regression, which can be applied when p n. We introduced a method for testing the significance of regression coefficients estimated using ridge regression. We have enabled ridge regression to be a feasible tool for genetic risk prediction on a genome-wide scale.
66 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges.
67 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models.
68 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface).
69 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations.
70 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not.
71 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented.
72 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented. Graphical outputs - ridge and p-value traces.
73 R package ridge Fitting ridge regression models to data comprising hundreds of thousands of predictors presents computational challenges. We have written an R package, ridge, for fitting such models. For large data sets, C code is used (with a user-friendly R interface). Where available, multi-core or GPU computation speeds up matrix operations. Flexibility to include non-genetic covariates - penalized or not. Significance test is implemented. Graphical outputs - ridge and p-value traces. Option for user-specified shrinkage parameter, with our semi-automatic method as the default.
74 Acknowledgements Maria De Iorio Colleagues in the Department of Epidemiology and Biostatistics, Imperial College London Colleagues in the Department of Statistical Science, University College London ILCO study nested within EPIC WTCCC and GAIN studies
75 References [1] D. G Clayton. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genetics, [2] Erika Cule and Maria De Iorio. A semi-automatic method to guide the choice of ridge parameter in ridge regression. arxiv, stat.ap, May [3] Erika Cule, Paolo Vineis, and Maria De Iorio. Significance testing in ridge regression for genetic data. BMC Bioinformatics, 12(1):372, [4] Ildiko Frank and Jerome Friedman. A statistical view of some chemometrics regression tools. Technometrics, 35(2): , May [5] A M Halawa and M Y El Bassiouni. Tests of regression coefficients under ridge regression models. Journal of Statistical Computation and Simulation, 65(1): , [6] T Hastie and R Tibshirani. Generalized Additive Models. Chapman & Hall, [7] Arthur E Hoerl and RW Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55 67, [8] Clive J Hoggart, JC Whittaker, M De Iorio, and David J Balding. Simultaneous analysis of all snps in genome-wide and re-sequencing association studies. PLoS Genet, 4(7):e , [9] Nathalie Malo, Ondrej Libiger, and Nicholas J Schork. Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet, 82(2): , Feb [10] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58: , [11] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2): , Jan 2005.
Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals
Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline
More informationVARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA
VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA by Melissa L. Ziegler B.S. Mathematics, Elizabethtown College, 2000 M.A. Statistics, University of Pittsburgh, 2002 Submitted to the Graduate Faculty
More informationChapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)
Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it
More informationAnalysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen
More informationAnale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018
HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo
More informationApplying Machine Learning Methods in Medical Research Studies
Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk
More informationStructured Association Advanced Topics in Computa8onal Genomics
Structured Association 02-715 Advanced Topics in Computa8onal Genomics Structured Association Lasso ACGTTTTACTGTACAATT Gflasso (Kim & Xing, 2009) ACGTTTTACTGTACAATT Greater power Fewer false posi2ves Phenome
More informationNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationMachine Learning to Inform Breast Cancer Post-Recovery Surveillance
Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction
More informationTreatment effect estimates adjusted for small-study effects via a limit meta-analysis
Treatment effect estimates adjusted for small-study effects via a limit meta-analysis Gerta Rücker 1, James Carpenter 12, Guido Schwarzer 1 1 Institute of Medical Biometry and Medical Informatics, University
More informationMultivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University
Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University High dimensional multivariate data, where the number of variables
More informationThe impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring
Volume 31 (1), pp. 17 37 http://orion.journals.ac.za ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression
More informationWhat is Regularization? Example by Sean Owen
What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large
More informationMultivariate dose-response meta-analysis: an update on glst
Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/
More informationSNPrints: Defining SNP signatures for prediction of onset in complex diseases
SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace
More informationA Comparative Study of Some Estimation Methods for Multicollinear Data
International Journal of Engineering and Applied Sciences (IJEAS) A Comparative Study of Some Estimation Methods for Multicollinear Okeke Evelyn Nkiruka, Okeke Joseph Uchenna Abstract This article compares
More informationARTICLE Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression
ARTICLE Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression Nathalie Malo, 1,2 Ondrej Libiger, 1,2 and Nicholas J. Schork 1,2, * Large-scale genetic-association studies
More informationSubLasso:a feature selection and classification R package with a. fixed feature subset
SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced
More informationGenetic association analysis incorporating intermediate phenotypes information for complex diseases
University of Iowa Iowa Research Online Theses and Dissertations Fall 2011 Genetic association analysis incorporating intermediate phenotypes information for complex diseases Yafang Li University of Iowa
More informationQuantitative Trait Analysis in Sibling Pairs. Biostatistics 666
Quantitative Trait Analsis in Sibling Pairs Biostatistics 666 Outline Likelihood function for bivariate data Incorporate genetic kinship coefficients Incorporate IBD probabilities The data Pairs of measurements
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationResponse to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008
Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert
More informationSupplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.
Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;
More informationComparing heritability estimates for twin studies + : & Mary Ellen Koran. Tricia Thornton-Wells. Bennett Landman
Comparing heritability estimates for twin studies + : & Mary Ellen Koran Tricia Thornton-Wells Bennett Landman January 20, 2014 Outline Motivation Software for performing heritability analysis Simulations
More informationArticle from. Forecasting and Futurism. Month Year July 2015 Issue Number 11
Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used
More informationData Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine
Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize
More informationHuman population sub-structure and genetic association studies
Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from
More informationPrediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer
Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationSmall-area estimation of mental illness prevalence for schools
Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March
More informationVariable selection should be blinded to the outcome
Variable selection should be blinded to the outcome Tamás Ferenci Manuscript type: Letter to the Editor Title: Variable selection should be blinded to the outcome Author List: Tamás Ferenci * (Physiological
More informationDeveloping and evaluating polygenic risk prediction models for stratified disease prevention
Developing and evaluating polygenic risk prediction models for stratified disease prevention Nilanjan Chatterjee 1 3, Jianxin Shi 3 and Montserrat García-Closas 3 Abstract Knowledge of genetics and its
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics
More informationThe Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0
The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used
More informationApplied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business
Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A
More informationReveal Relationships in Categorical Data
SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction
More informationRISK PREDICTION MODEL: PENALIZED REGRESSIONS
RISK PREDICTION MODEL: PENALIZED REGRESSIONS Inspired from: How to develop a more accurate risk prediction model when there are few events Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann,
More informationBayesian graphical models for combining multiple data sources, with applications in environmental epidemiology
Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence
More informationCNV PCA Search Tutorial
CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only
More informationExample 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)
Example 7.2 Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Example 7.2. Autocorrelation 1 / 17 Questions.
More informationThe SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance
The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis
More informationMendelian Randomization
Mendelian Randomization Drawback with observational studies Risk factor X Y Outcome Risk factor X? Y Outcome C (Unobserved) Confounders The power of genetics Intermediate phenotype (risk factor) Genetic
More informationLecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009
Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences p. 1/3 Measurement, Design, and Analytic Techniques in Mental Health and Behavioral Sciences Lecture 14: Adjusting
More informationImpact of Response Variability on Pareto Front Optimization
Impact of Response Variability on Pareto Front Optimization Jessica L. Chapman, 1 Lu Lu 2 and Christine M. Anderson-Cook 3 1 Department of Mathematics, Computer Science, and Statistics, St. Lawrence University,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationHeritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016
Heritability enrichment of differentially expressed genes Hilary Finucane PGC Statistical Analysis Call January 26, 2016 1 Functional genomics + GWAS gives insight into disease relevant tissues Trynka
More informationBootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients
Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients Muhammad Hasan Sidiq Kurniawan 1) 1)* Department of Statistics, Universitas Islam Indonesia hasansidiq@uiiacid
More informationOutline. Introduction GitHub and installation Worked example Stata wishes Discussion. mrrobust: a Stata package for MR-Egger regression type analyses
mrrobust: a Stata package for MR-Egger regression type analyses London Stata User Group Meeting 2017 8 th September 2017 Tom Palmer Wesley Spiller Neil Davies Outline Introduction GitHub and installation
More informationNature Genetics: doi: /ng Supplementary Figure 1
Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,
More informationStatistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012
Statistical Tests for X Chromosome Association Study with Simulations Jian Wang July 10, 2012 Statistical Tests Zheng G, et al. 2007. Testing association for markers on the X chromosome. Genetic Epidemiology
More informationWhole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis
HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control
More informationRare Variant Burden Tests. Biostatistics 666
Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls
More informationMultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS
MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS Paul F. O Reilly 1 *, Clive J. Hoggart 2, Yotsawat Pomyen 3,4, Federico C. F. Calboli 1, Paul Elliott 1,5, Marjo- Riitta Jarvelin
More informationSupplementary Online Content
Supplementary Online Content Hartwig FP, Borges MC, Lessa Horta B, Bowden J, Davey Smith G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample mendelian randomization study. JAMA Psychiatry.
More informationWhite Paper Guidelines on Vetting Genetic Associations
White Paper 23-03 Guidelines on Vetting Genetic Associations Authors: Andro Hsu Brian Naughton Shirley Wu Created: November 14, 2007 Revised: February 14, 2008 Revised: June 10, 2010 (see end of document
More informationAssessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves
Research Article Received 8 June 2010, Accepted 15 February 2011 Published online 15 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4246 Assessment of a disease screener by
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationAspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University
Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationGraphical Modeling Approaches for Estimating Brain Networks
Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector
More informationApplications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis
DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:
More informationBayesian hierarchical modelling
Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:
More informationImplications of Longitudinal Data in Machine Learning for Medicine and Epidemiology
Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology Billy Heung Wing Chang, Yanxian Chen, Mingguang He Zhongshan Ophthalmic Center, Sun Yat-sen University Biostatistics
More informationThe Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates
The Effects of Autocorrelated Noise and Biased HRF in fmri Analysis Error Rates Ariana Anderson University of California, Los Angeles Departments of Psychiatry and Behavioral Sciences David Geffen School
More informationRussian Journal of Agricultural and Socio-Economic Sciences, 3(15)
ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer
More informationIntroduction of Genome wide Complex Trait Analysis (GCTA) Presenter: Yue Ming Chen Location: Stat Gen Workshop Date: 6/7/2013
Introduction of Genome wide Complex Trait Analysis (GCTA) resenter: ue Ming Chen Location: Stat Gen Workshop Date: 6/7/013 Outline Brief review of quantitative genetics Overview of GCTA Ideas Main functions
More informationRefining multivariate disease phenotypes for high chip heritability
Sun et al. RESEARCH Refining multivariate disease phenotypes for high chip heritability Jiangwen Sun 1, Henry R. Kranzler 2 and Jinbo Bi 1* * Correspondence: jinbo@engr.uconn.edu 1 Department of Computer
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationInference with Difference-in-Differences Revisited
Inference with Difference-in-Differences Revisited M. Brewer, T- F. Crossley and R. Joyce Journal of Econometric Methods, 2018 presented by Federico Curci February 22nd, 2018 Brewer, Crossley and Joyce
More informationSupplementary Figures
Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample
More informationTitle: Pinpointing resilience in Bipolar Disorder
Title: Pinpointing resilience in Bipolar Disorder 1. AIM OF THE RESEARCH AND BRIEF BACKGROUND Bipolar disorder (BD) is a mood disorder characterised by episodes of depression and mania. It ranks as one
More informationSingle SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)
High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set
More informationISIR: Independent Sliced Inverse Regression
ISIR: Independent Sliced Inverse Regression Kevin B. Li Beijing Jiaotong University Abstract In this paper we consider a semiparametric regression model involving a p-dimensional explanatory variable x
More informationA COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY
A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,
More informationAnalyzing diastolic and systolic blood pressure individually or jointly?
Analyzing diastolic and systolic blood pressure individually or jointly? Chenglin Ye a, Gary Foster a, Lisa Dolovich b, Lehana Thabane a,c a. Department of Clinical Epidemiology and Biostatistics, McMaster
More informationLinear Regression Analysis
Linear Regression Analysis WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Peter Bloomfield, Noel A. C. Cressie, Nicholas I.
More informationScore Tests of Normality in Bivariate Probit Models
Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model
More informationThe Late Pretest Problem in Randomized Control Trials of Education Interventions
The Late Pretest Problem in Randomized Control Trials of Education Interventions Peter Z. Schochet ACF Methods Conference, September 2012 In Journal of Educational and Behavioral Statistics, August 2010,
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationData Mining in Bioinformatics Day 7: Clustering in Bioinformatics
Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:
More informationSensitivity, specicity, ROC
Sensitivity, specicity, ROC Thomas Alexander Gerds Department of Biostatistics, University of Copenhagen 1 / 53 Epilog: disease prevalence The prevalence is the proportion of cases in the population today.
More informationarxiv: v2 [stat.ap] 7 Dec 2016
A Bayesian Approach to Predicting Disengaged Youth arxiv:62.52v2 [stat.ap] 7 Dec 26 David Kohn New South Wales 26 david.kohn@sydney.edu.au Nick Glozier Brain Mind Centre New South Wales 26 Sally Cripps
More informationDoes Machine Learning. In a Learning Health System?
Does Machine Learning Have a Place In a Learning Health System? Grand Rounds: Rethinking Clinical Research Friday, December 15, 2017 Michael J. Pencina, PhD Professor of Biostatistics and Bioinformatics,
More informationMODEL SELECTION STRATEGIES. Tony Panzarella
MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3
More informationSelection and Combination of Markers for Prediction
Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe
More informationIdentification of Neuroimaging Biomarkers
Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to
More informationAccurate Liability Estimation Substantially Improves Power in Ascertained Case. Running Title: Liability Estimation Improves Case Control GWAS
Accurate Liability Estimation Substantially Improves Power in Ascertained Case Control Studies Omer Weissbrod 1,*, Christoph Lippert 2, Dan Geiger 1 and David Heckerman 2,** 1 Computer Science Department,
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationLogistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India
20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision
More informationIntroduction to Genetics and Genomics
2016 Introduction to enetics and enomics 3. ssociation Studies ggibson.gt@gmail.com http://www.cig.gatech.edu Outline eneral overview of association studies Sample results hree steps to WS: primary scan,
More informationIntroduction to the Genetics of Complex Disease
Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome
More informationCS2220 Introduction to Computational Biology
CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS
More informationAssessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.
Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong
More informationCross-validation. Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease.
Cross-validation Miguel Angel Luque Fernandez Faculty of Epidemiology and Population Health Department of Non-communicable Disease. August 25, 2015 Cancer Survival Group (LSH&TM) Cross-validation August
More information