On Missing Data and Genotyping Errors in Association Studies

Size: px
Start display at page:

Download "On Missing Data and Genotyping Errors in Association Studies"

Transcription

1 On Missing Data and Genotyping Errors in Association Studies Department of Biostatistics Johns Hopkins Bloomberg School of Public Health May 16, 2008

2 Specific Aims of our R01 1 Develop and evaluate new statistical methods to prioritize genes through proper ranking in genome-wide association (GWA) studies that address GxE interactions. 2 Develop and evaluate new statistical methods to localize causal genes as part of linkage and fine mapping studies while considering GxE interactions. 3 Develop and evaluate new statistical methods to identify higher order interactions between environmental variables and SNPs in candidate genes studies.

3 Specific Aims (cont.) 4 Adapt existing and develop new statistical methods to address imprecise and missing environmental and genetic measurements. 5 Develop and disseminate efficient algorithms for GxE analyses, and apply these methods in several ongoing genetic studies of complex diseases.

4 Missing Data There are mainly three types of missing / unobserved data in genetic association studies: 1 Missing observations in some environmental variables. 2 Missing data at SNPs selected for genotyping. 3 Genotypes of SNPs not selected.

5 Approaches for Missing Data The most common approach for dealing with missing data is to omit the observations that have missing records in the model s covariates. This approach can have several shortcomings, including: Loss of power. Bias in the parameter estimates. A good reference on this topic is Greenland and Finkle (1995). Some other used approaches are: To impute a value from the marginal distribution of the covariate. To create an extra level indicating missingness, if the covariate is a factor. These choices tend to be not so great either.

6 Approaches for Missing Data Multiple imputation can be used to draw valid statistical inference from data with missing values when the data are missing at random (Little and Rubin 1987, Schafer 1997). In essence, multiple imputation acknowledges the uncertainty due to missing data, instead of simply ignoring it: several complete data sets are generated, and the uncertainty in the model parameter estimates incorporates the standard errors of the parameter estimates as well as the variability between the parameter estimates from the replicate data sets. While the hypothesis of missing at random cannot formally be tested, it is a lot less stringent than the requirement of missing completely at random, which is the underlying assumption made when observations are omitted.

7 Missing Environmental Data Number of Pairs Odds Ratio Confidence Interval XPD Lys751Gln original data set ( ) multiple imputations ( ) XPD Gln751Gln original data set ( ) multiple imputations ( ) Positive Family History original data set ( ) multiple imputations ( )

8 Missing Environmental Data Family History not complete Family History complete AA AC CC na AA AC CC na raw numbers case control percentages case control Reference: Brewster AM et al (2006). Polymorphisms of the DNA Repair Genes XPD (Lys751Gln) and XRCC1 (Arg399Gln and Arg194Trp): Relationship to Breast Cancer Risk and Familial Predisposition to Breast Cancer. Breast Cancer Res Treat, 95(1):

9 Missing Environmental Data 4 XPD Lys751Gln odds ratio XPD Gln751Gln odds ratio Family History odds ratio combined original The missing data were imputed using decision trees. Reference: Dai J et al (2006). Imputation Methods to Improve Inference in SNP Association Studies. Genetic Epidemiology, 30(8):

10 Why Become a Biostatistician? Because people appreciate your help analyzing their data, and that means that people surely will like you.

11 Tree-based Imputation Classification trees are great for categorical data!

12 Dummy Levels for Missingness hazards ratio SNP 1 SNP 2 SNP 3 SNP 4 SNP 5 SNP 6 SNP 7 SNP 8 SNP 9 SNP 10 SNP 11 SNP 12 SNP 13 Unpublished data.

13 Incorporating Genotype Uncertainty The confidence in genotype calls can differ substantially between SNPs! 4 Concordant 2 AA AB BB Discordant called BB (AB) Sense Antisense

14 Missingness at Random? From the white paper, updates/brlmm algorithm.affx

15 Incorporating Genotype Uncertainty 100 [ truncated ] log(ratio) Ratio correlation(genotype,crlmm) / correlation(genotype,brlmm)

16 Incorporating Genotype Uncertainty Easy SNP Difficult SNP Median Rank [ 10,000 simulations ] True Genotype CRLMM (continuous) CRLMM (called) 75 BRLMM (all) BRLMM (selected) RR: θ RR: θ

17 Incorporating Genotype Uncertainty 100 [ truncated ] log(ratio) Ratio correlation(genotype,crlmm) / correlation(genotype,crlmm[called])

18 Incorporating Genotype Uncertainty - CNVs 5.0 A D B C E

19 Incorporating Genotype Uncertainty - CNVs Deletion Normal LOH Amplification A D B C E 2 1 Van ICE A D B E Van ICE Mb

20 A HapMap Sample Deletion Normal LOH Amplification 1 Van ICE Van ICE Mb

21 Many HapMap Samples

22 De Novo Deletion

CS2220 Introduction to Computational Biology

CS2220 Introduction to Computational Biology CS2220 Introduction to Computational Biology WEEK 8: GENOME-WIDE ASSOCIATION STUDIES (GWAS) 1 Dr. Mengling FENG Institute for Infocomm Research Massachusetts Institute of Technology mfeng@mit.edu PLANS

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation Appendix 1 Sensitivity analysis for ACQ: missing value analysis by multiple imputation A sensitivity analysis was carried out on the primary outcome measure (ACQ) using multiple imputation (MI). MI is

More information

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis BST227 Introduction to Statistical Genetics Lecture 4: Introduction to linkage and association analysis 1 Housekeeping Homework #1 due today Homework #2 posted (due Monday) Lab at 5:30PM today (FXB G13)

More information

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Structural variation (SVs) Copy-number variations C Deletion A B C Balanced rearrangements A B A B C B A C Duplication Inversion Causes

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

Help! Statistics! Missing data. An introduction

Help! Statistics! Missing data. An introduction Help! Statistics! Missing data. An introduction Sacha la Bastide-van Gemert Medical Statistics and Decision Making Department of Epidemiology UMCG Help! Statistics! Lunch time lectures What? Frequently

More information

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Dr Elaine Kenny Neuropsychiatric Genetics Research Group Institute of Molecular Medicine Trinity College Dublin

More information

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017

Large-scale identity-by-descent mapping discovers rare haplotypes of large effect. Suyash Shringarpure 23andMe, Inc. ASHG 2017 Large-scale identity-by-descent mapping discovers rare haplotypes of large effect Suyash Shringarpure 23andMe, Inc. ASHG 2017 1 Why care about rare variants of large effect? Months from randomization 2

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

Introduction to the Genetics of Complex Disease

Introduction to the Genetics of Complex Disease Introduction to the Genetics of Complex Disease Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Breakthroughs in Genome

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1 Welch et al. BMC Medical Research Methodology (2018) 18:89 https://doi.org/10.1186/s12874-018-0548-0 RESEARCH ARTICLE Open Access Does pattern mixture modelling reduce bias due to informative attrition

More information

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics

Copy Number Variations and Association Mapping Advanced Topics in Computa8onal Genomics Copy Number Variations and Association Mapping 02-715 Advanced Topics in Computa8onal Genomics SNP and CNV Genotyping SNP genotyping assumes two copy numbers at each locus (i.e., no CNVs) CNV genotyping

More information

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 7-1-2001 The Relative Performance of

More information

What to do with missing data in clinical registry analysis?

What to do with missing data in clinical registry analysis? Melbourne 2011; Registry Special Interest Group What to do with missing data in clinical registry analysis? Rory Wolfe Acknowledgements: James Carpenter, Gerard O Reilly Department of Epidemiology & Preventive

More information

GENOME-WIDE ASSOCIATION STUDIES

GENOME-WIDE ASSOCIATION STUDIES GENOME-WIDE ASSOCIATION STUDIES SUCCESSES AND PITFALLS IBT 2012 Human Genetics & Molecular Medicine Zané Lombard IDENTIFYING DISEASE GENES??? Nature, 15 Feb 2001 Science, 16 Feb 2001 IDENTIFYING DISEASE

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Tutorial on Genome-Wide Association Studies

Tutorial on Genome-Wide Association Studies Tutorial on Genome-Wide Association Studies Assistant Professor Institute for Computational Biology Department of Epidemiology and Biostatistics Case Western Reserve University Acknowledgements Dana Crawford

More information

Structural Variation and Medical Genomics

Structural Variation and Medical Genomics Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,

More information

Master thesis Department of Statistics

Master thesis Department of Statistics Master thesis Department of Statistics Masteruppsats, Statistiska institutionen Missing Data in the Swedish National Patients Register: Multiple Imputation by Fully Conditional Specification Jesper Hörnblad

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

UNIVERSITY OF CALIFORNIA, LOS ANGELES

UNIVERSITY OF CALIFORNIA, LOS ANGELES UNIVERSITY OF CALIFORNIA, LOS ANGELES BERKELEY DAVIS IRVINE LOS ANGELES MERCED RIVERSIDE SAN DIEGO SAN FRANCISCO UCLA SANTA BARBARA SANTA CRUZ DEPARTMENT OF EPIDEMIOLOGY SCHOOL OF PUBLIC HEALTH CAMPUS

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Advanced Handling of Missing Data

Advanced Handling of Missing Data Advanced Handling of Missing Data One-day Workshop Nicole Janz ssrmcta@hermes.cam.ac.uk 2 Goals Discuss types of missingness Know advantages & disadvantages of missing data methods Learn multiple imputation

More information

Statistical data preparation: management of missing values and outliers

Statistical data preparation: management of missing values and outliers KJA Korean Journal of Anesthesiology Statistical Round pissn 2005-6419 eissn 2005-7563 Statistical data preparation: management of missing values and outliers Sang Kyu Kwak 1 and Jong Hae Kim 2 Departments

More information

How should the propensity score be estimated when some confounders are partially observed?

How should the propensity score be estimated when some confounders are partially observed? How should the propensity score be estimated when some confounders are partially observed? Clémence Leyrat 1, James Carpenter 1,2, Elizabeth Williamson 1,3, Helen Blake 1 1 Department of Medical statistics,

More information

Problem 3: Simulated Rheumatoid Arthritis Data

Problem 3: Simulated Rheumatoid Arthritis Data Problem 3: Simulated Rheumatoid Arthritis Data Michael B Miller Michael Li Gregg Lind Soon-Young Jang The plan

More information

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA BIOSTATISTICAL METHODS AND RESEARCH DESIGNS Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA Keywords: Case-control study, Cohort study, Cross-Sectional Study, Generalized

More information

Analysis methods for improved external validity

Analysis methods for improved external validity Analysis methods for improved external validity Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics www.biostat.jhsph.edu/ estuart estuart@jhsph.edu

More information

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder) September 14, 2012 Chun Xu M.D, M.Sc, Ph.D. Assistant professor Texas Tech University Health Sciences Center Paul

More information

Quantitative genetics: traits controlled by alleles at many loci

Quantitative genetics: traits controlled by alleles at many loci Quantitative genetics: traits controlled by alleles at many loci Human phenotypic adaptations and diseases commonly involve the effects of many genes, each will small effect Quantitative genetics allows

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

Methods for Computing Missing Item Response in Psychometric Scale Construction

Methods for Computing Missing Item Response in Psychometric Scale Construction American Journal of Biostatistics Original Research Paper Methods for Computing Missing Item Response in Psychometric Scale Construction Ohidul Islam Siddiqui Institute of Statistical Research and Training

More information

Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference

Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference COURSE: Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling, Sensitivity Analysis, and Causal Inference Mike Daniels (Department of Statistics, University of Florida) 20-21 October 2011

More information

Human Genetics of Tuberculosis. Laurent Abel Laboratory of Human Genetics of Infectious Diseases University Paris Descartes/INSERM U980

Human Genetics of Tuberculosis. Laurent Abel Laboratory of Human Genetics of Infectious Diseases University Paris Descartes/INSERM U980 Human Genetics of Tuberculosis Laurent Abel Laboratory of Human Genetics of Infectious Diseases University Paris Descartes/INSERM U980 Human genetics in tuberculosis? Concept Epidemiological/familial

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles (green) of (A) HLA-A, HLA-B, (C) HLA-C, (D) HLA-DQA1,

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Genetics and Genomics in Medicine Chapter 8 Questions

Genetics and Genomics in Medicine Chapter 8 Questions Genetics and Genomics in Medicine Chapter 8 Questions Linkage Analysis Question Question 8.1 Affected members of the pedigree above have an autosomal dominant disorder, and cytogenetic analyses using conventional

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

Human population sub-structure and genetic association studies

Human population sub-structure and genetic association studies Human population sub-structure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from

More information

Propensity scores and instrumental variables to control for confounding. ISPE mid-year meeting München, 2013 Rolf H.H. Groenwold, MD, PhD

Propensity scores and instrumental variables to control for confounding. ISPE mid-year meeting München, 2013 Rolf H.H. Groenwold, MD, PhD Propensity scores and instrumental variables to control for confounding ISPE mid-year meeting München, 2013 Rolf H.H. Groenwold, MD, PhD WP2 WG2: aims Evaluate methods to control for observed and unobserved

More information

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed. Reviewers' Comments: Reviewer #1 (Remarks to the Author) The manuscript titled 'Association of variations in HLA-class II and other loci with susceptibility to lung adenocarcinoma with EGFR mutation' evaluated

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Barnali Das NAACCR Webinar May 2016 Outline Basic concepts Missing data mechanisms Methods used to handle missing data 1 What are missing data? General term: data we intended

More information

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014 Challenges of CGH array testing in children with developmental delay Dr Sally Davies 17 th September 2014 CGH array What is CGH array? Understanding the test Benefits Results to expect Consent issues Ethical

More information

The RoB 2.0 tool (individually randomized, cross-over trials)

The RoB 2.0 tool (individually randomized, cross-over trials) The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is

More information

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan

Sequential nonparametric regression multiple imputations. Irina Bondarenko and Trivellore Raghunathan Sequential nonparametric regression multiple imputations Irina Bondarenko and Trivellore Raghunathan Department of Biostatistics, University of Michigan Ann Arbor, MI 48105 Abstract Multiple imputation,

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Practical Statistical Reasoning in Clinical Trials

Practical Statistical Reasoning in Clinical Trials Seminar Series to Health Scientists on Statistical Concepts 2011-2012 Practical Statistical Reasoning in Clinical Trials Paul Wakim, PhD Center for the National Institute on Drug Abuse 10 January 2012

More information

Research Strategy: 1. Background and Significance

Research Strategy: 1. Background and Significance Research Strategy: 1. Background and Significance 1.1. Heterogeneity is a common feature of cancer. A better understanding of this heterogeneity may present therapeutic opportunities: Intratumor heterogeneity

More information

Measuring association in contingency tables

Measuring association in contingency tables Measuring association in contingency tables Patrick Breheny April 3 Patrick Breheny University of Iowa Introduction to Biostatistics (BIOS 4120) 1 / 28 Hypothesis tests and confidence intervals Fisher

More information

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. Supplementary Figure. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations. a Eigenvector 2.5..5.5. African Americans European Americans e

More information

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15

Replacing IBS with IBD: The MLS Method. Biostatistics 666 Lecture 15 Replacing IBS with IBD: The MLS Method Biostatistics 666 Lecture 5 Previous Lecture Analysis of Affected Relative Pairs Test for Increased Sharing at Marker Expected Amount of IBS Sharing Previous Lecture:

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference. Causality is difficult to Show from cross Review Lecture II: Regression Discontinuity and Difference in Difference From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Challenges of Observational and Retrospective Studies

Challenges of Observational and Retrospective Studies Challenges of Observational and Retrospective Studies Kyoungmi Kim, Ph.D. March 8, 2017 This seminar is jointly supported by the following NIH-funded centers: Background There are several methods in which

More information

Application of chromosomal radiosensitivity assays to temporary nuclear power plant workers

Application of chromosomal radiosensitivity assays to temporary nuclear power plant workers Application of chromosomal radiosensitivity assays to temporary nuclear power plant workers Research group University Ghent in collaboration with Dr. M. Barbé and the Occupational Medicine Service Nuclear

More information

Quality Control Analysis of Add Health GWAS Data

Quality Control Analysis of Add Health GWAS Data 2018 Add Health Documentation Report prepared by Heather M. Highland Quality Control Analysis of Add Health GWAS Data Christy L. Avery Qing Duan Yun Li Kathleen Mullan Harris CAROLINA POPULATION CENTER

More information

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Judea Pearl University of California, Los Angeles Computer Science Department Los

More information

Understanding DNA Copy Number Data

Understanding DNA Copy Number Data Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Developing and evaluating polygenic risk prediction models for stratified disease prevention

Developing and evaluating polygenic risk prediction models for stratified disease prevention Developing and evaluating polygenic risk prediction models for stratified disease prevention Nilanjan Chatterjee 1 3, Jianxin Shi 3 and Montserrat García-Closas 3 Abstract Knowledge of genetics and its

More information

Imputation approaches for potential outcomes in causal inference

Imputation approaches for potential outcomes in causal inference Int. J. Epidemiol. Advance Access published July 25, 2015 International Journal of Epidemiology, 2015, 1 7 doi: 10.1093/ije/dyv135 Education Corner Education Corner Imputation approaches for potential

More information

Imputation classes as a framework for inferences from non-random samples. 1

Imputation classes as a framework for inferences from non-random samples. 1 Imputation classes as a framework for inferences from non-random samples. 1 Vladislav Beresovsky (hvy4@cdc.gov) National Center for Health Statistics, CDC 1 Disclaimer: The findings and conclusions in

More information

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs

Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs Practical challenges that copy number variation and whole genome sequencing create for genetic diagnostic labs Joris Vermeesch, Center for Human Genetics K.U.Leuven, Belgium ESHG June 11, 2010 When and

More information

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN

THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN THE USE OF NONPARAMETRIC PROPENSITY SCORE ESTIMATION WITH DATA OBTAINED USING A COMPLEX SAMPLING DESIGN Ji An & Laura M. Stapleton University of Maryland, College Park May, 2016 WHAT DOES A PROPENSITY

More information

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups Elizabeth A. Stuart Johns Hopkins Bloomberg School of Public Health Departments of Mental

More information

Assessing Accuracy of Genotype Imputation in American Indians

Assessing Accuracy of Genotype Imputation in American Indians Assessing Accuracy of Genotype Imputation in American Indians Alka Malhotra*, Sayuko Kobes, Clifton Bogardus, William C. Knowler, Leslie J. Baier, Robert L. Hanson Phoenix Epidemiology and Clinical Research

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

Dan Koller, Ph.D. Medical and Molecular Genetics

Dan Koller, Ph.D. Medical and Molecular Genetics Design of Genetic Studies Dan Koller, Ph.D. Research Assistant Professor Medical and Molecular Genetics Genetics and Medicine Over the past decade, advances from genetics have permeated medicine Identification

More information

By: Mei-Jie Zhang, Ph.D.

By: Mei-Jie Zhang, Ph.D. Propensity Scores By: Mei-Jie Zhang, Ph.D. Medical College of Wisconsin, Division of Biostatistics Friday, March 29, 2013 12:00-1:00 pm The Medical College of Wisconsin is accredited by the Accreditation

More information

Rare Variant Burden Tests. Biostatistics 666

Rare Variant Burden Tests. Biostatistics 666 Rare Variant Burden Tests Biostatistics 666 Last Lecture Analysis of Short Read Sequence Data Low pass sequencing approaches Modeling haplotype sharing between individuals allows accurate variant calls

More information

Introduction to Genetics and Genomics

Introduction to Genetics and Genomics 2016 Introduction to enetics and enomics 3. ssociation Studies ggibson.gt@gmail.com http://www.cig.gatech.edu Outline eneral overview of association studies Sample results hree steps to WS: primary scan,

More information

Sensitivity Analysis in Observational Research: Introducing the E-value

Sensitivity Analysis in Observational Research: Introducing the E-value Sensitivity Analysis in Observational Research: Introducing the E-value Tyler J. VanderWeele Harvard T.H. Chan School of Public Health Departments of Epidemiology and Biostatistics 1 Plan of Presentation

More information

Friday, September 9, :00-11:00 am Warwick Evans Conference Room, Building D Refreshments will be provided at 9:45am

Friday, September 9, :00-11:00 am Warwick Evans Conference Room, Building D Refreshments will be provided at 9:45am The Role of the Biostatistician in Cancer Research Edmund A. Gehan, PhD Professor Emeritus, Department of Biostatistics, Bioinformatics and Biomathematics Lombardi Comprehensive Cancer Center Georgetown

More information

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari *

MISSING DATA AND PARAMETERS ESTIMATES IN MULTIDIMENSIONAL ITEM RESPONSE MODELS. Federico Andreis, Pier Alda Ferrari * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 431 437 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p431 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

November 9, Johns Hopkins School of Medicine, Baltimore, MD, Fast detection of de-novo copy number variants from case-parent SNP arrays identifies a deletion on chromosome 7p14.1 associated with non-syndromic isolated cleft lip/palate Samuel G. Younkin 1, Robert

More information

Summary & general discussion

Summary & general discussion Summary & general discussion 160 chapter 8 The aim of this thesis was to identify genetic and environmental risk factors for behavioral problems, in particular Attention Problems (AP) and Attention Deficit

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Illustrative example of ptdt using height The expected value of a child s polygenic risk score (PRS) for a trait is the average of maternal and paternal PRS values. For example,

More information

NGS panels in clinical diagnostics: Utrecht experience. Van Gijn ME PhD Genome Diagnostics UMCUtrecht

NGS panels in clinical diagnostics: Utrecht experience. Van Gijn ME PhD Genome Diagnostics UMCUtrecht NGS panels in clinical diagnostics: Utrecht experience Van Gijn ME PhD Genome Diagnostics UMCUtrecht 93 Gene panels UMC Utrecht Cardiovascular disease (CAR) (5 panels) Epilepsy (EPI) (11 panels) Hereditary

More information

Imaging Genetics: Heritability, Linkage & Association

Imaging Genetics: Heritability, Linkage & Association Imaging Genetics: Heritability, Linkage & Association David C. Glahn, PhD Olin Neuropsychiatry Research Center & Department of Psychiatry, Yale University July 17, 2011 Memory Activation & APOE ε4 Risk

More information

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx

Conditions. Name : dummy Age/sex : xx Y /x. Lab No : xxxxxxxxx. Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx Name : dummy Age/sex : xx Y /x Lab No : xxxxxxxxx Rep Centre : xxxxxxxxxxx Ref by : Dr. xxxxxxxxxx Rec. Date : xx/xx/xx Rep Date : xx/xx/xx GENETIC MAPPING FOR ONCOLOGY Conditions Melanoma Prostate Cancer

More information

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d Biostatistics and Research Design in Dentistry Reading Assignment Measuring the accuracy of diagnostic procedures and Using sensitivity and specificity to revise probabilities, in Chapter 12 of Dawson

More information

P E R S P E C T I V E S

P E R S P E C T I V E S PHOENIX CENTER FOR ADVANCED LEGAL & ECONOMIC PUBLIC POLICY STUDIES Revisiting Internet Use and Depression Among the Elderly George S. Ford, PhD June 7, 2013 Introduction Four years ago in a paper entitled

More information

Review of Pre-crash Behaviour in Fatal Road Collisions Report 1: Alcohol

Review of Pre-crash Behaviour in Fatal Road Collisions Report 1: Alcohol Review of Pre-crash Behaviour in Fatal Road Collisions Research Department Road Safety Authority September 2011 Contents Executive Summary... 3 Introduction... 4 Road Traffic Fatality Collision Data in

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Estimating heritability for cause specific mortality based on twin studies

Estimating heritability for cause specific mortality based on twin studies Estimating heritability for cause specific mortality based on twin studies Thomas H. Scheike, Klaus K. Holst Department of Biostatistics, University of Copenhagen Øster Farimagsgade 5, DK-1014 Copenhagen

More information

Inferring causality in observational epidemiology: Breast Cancer Risk as an Example

Inferring causality in observational epidemiology: Breast Cancer Risk as an Example Inferring causality in observational epidemiology: Breast Cancer Risk as an Example Mary Beth Terry, PhD Department of Epidemiology and Environmental Sciences Cancer Genes vs Environmental Risk Factors

More information

Missing data in medical research is

Missing data in medical research is Abstract Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information