False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

Size: px
Start display at page:

Download "False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University"

Transcription

1 False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University

2 Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,... ) simple questions Small data sets, 21st (Scientific mass production) complicated questions Huge data sets, FDRs and CNV 1

3 Example: Copy Number Variation CNV Gains and losses of chromosome segments (disease association) Instead of 2 copies, might have 0, 1, 3, 4,... Data x ij = noisy msmnt of copy number for subject j at marker position i i = 1, 2,..., N (5000) and j = 1, 2,..., n (150) (< 1% of data!) x ij approx. normal with mean 0 if copy number = 2 FDRs and CNV 2

4 What We Expect To See Hot positions: i where several of subjects j show unusually high (or low) values x ij For some subjects j: intervals of high (or low) values x ij Information on CNV locations in both directions FDRs and CNV 3

5 subject number j > Lowest.001 of the 750,000 entries x[i,j]; Subject 45 shows interval of low values around position 3800; Is position 1755 cnv prone? Interval position number i > FDRs and CNV 4

6 Z-Values X : x ij for i = 1, 2,..., N = 5000 positions, j = 1, 2,..., n = 150 subjects C i = all subjects at ith position (n = 150) Moving averages Replace x ij with x ij = i+5 i5 x lj /11 X (j s msmnts averaged over nearby positions) subtract row median Standardize rows of X divide by row robust standardization Gives Z matrix z ij iterative fdr i estimates FDRs and CNV 5

7 Simultaneous Hypothesis Testing M null hypotheses H 01, H 02,..., H 0M (M = 750, 000 for CNV) Case m has test statistic z m, null density f 0 (z) The problem Given z = (z 1, z 2,..., z M ), simultaneously test all M null hypotheses and don t make many mistakes! FDRs and CNV 6

8 The Bayesian Two-Groups Model Null Mixture Non-null Local false discovery rate fdr(z) = Pr{null z} = π o f 0 (z)/ f (z) Empirical Bayes z ˆπ 0, ˆ f0, ˆ f fdr(z) = ˆπ0 ˆ f0 (z) / ˆ f (z) Reject H 0m if fdr(z m ) small (see Efron, 2008) FDRs and CNV 7

9 Estimated local false discovery rate, all 750,000 zvalues; pihat0=.954, estimated null density N(.04,.93^2) local fdr z value fdrhat(z)=.1 at z=3.30 and 3.57 FDRs and CNV 8

10 zvalues at position i=1755 (solid histogram) compared to all the others (line) << low cnv high cnv > Now for position i= << low cnv high cnv > FDRs and CNV 9

11 A More General Model Classes: C 1, C 2,..., C i,..., C N with n i cases in C i CNV: C i = ith column, n i = n = 150 (the n = 150 subjects measured at position i) fdr i (z) = π i0 f i0 (z)/ f i (z) = Pr{null z, C i } FDRs and CNV 10

12 Combined and Separate Fdr s Strategy: Estimate fdr(z) = π 0 f 0 (z)/ f (z) from combined data and then modify for C i Assume f i0 (z) and f i1 (z) do not depend on i, only π i1 = Pr{non-null C i } varying across classes: fdr i (z) = fdr(z) / [1 + tdr(z)s i ] tdr(z) 1 fdr(z) = true discovery rate and S i = π / i1 πi0 1 π 1 π 0 FDRs and CNV 11

13 Iterative Estimation of fdr i (z) (Model 1) First Estimate fdr(z) = ˆπ 0 ˆ f0 (z) / ˆ f (z) from combined data (z 1, z 2,..., z M ) If k i non-nulls in C i : ˆπ i1 = k i /n gives Ŝ i and fdr i (z) = fdr(z) ( tdri 1 + tdr(z)ŝ = 1 fdr ) i i But ˆki = C i tdri (z m ) estimates k i Iterate! (5 cycles plenty in what follows) FDRs and CNV 12

14 Points where fdrhat <.01. Five iterations of Model 1, z[i,j] from moving averages (i5,i+5) subject marker position FDRs and CNV 13

15 subject j Points where fdrhat.i <.01 (five iterations) z[i,j] from moving averages (i5,i+5) marker position i khat[i] estimates for the 5000 positions khat[i] khat[1755] = marker position i FDRs and CNV 14

16 subject points where fdrk<.01; closeup positions 1700:1800; shows possible CNV region at 1750: marker position khat marker position FDRs and CNV 15

17 Is Position 1755 Significant? ˆk 1755 = 39.1 Believe CNV action at 1755? [ k = 8.13] Permutation test Randomly shift row j of X by s j units left (with wraparound): x j = (x s+1,j, x s+2,j,..., x 5000,j, x 1j, x 2j,..., x sj ) Do this for all 150 rows Recalculate ˆk i values Compare ˆk 1755 = 39.1 with {ˆk, i = 1, 2,..., 5000} i FDRs and CNV 16

18 Actual khat distribution compared to permutation distribution; Maximum khat = 23.3 Frequency actual permutations 39.1 > khat values > FDRs and CNV 17

19 Locally Most Powerful Tests Let r i = π i1 /π 1 = Pr{non-null C i } / Pr{non-null}. l i = n 1 { 1 + (ri 1)T(z ij ) } where T(z) = tdr(z) π 1 π 0 ˆk i nearly MLE in this model Test H 0i : r i = 1 vs r i > 1. Locally most powerful test rejects for large values of Use permutations to get p-values. ˆk (1) i. FDRs and CNV 18

20 Bootstrapping ˆk i Estimates Resample rows (i.e., subjects) Recompute iterative estimate ˆk i (5 iterations model) ŝd i = boot stdev of ˆk, B = 100 resamples i (did not recompute original fdr curve each time) ˆk ) [ N (ˆki, i ŝd2 i 6 ŝd i 7 for ˆk i > 20 ] FDRs and CNV 19

21 Bootstrap estimates of standard deviations for khat[i] values, (5 iterations) plotted vs khat[i]; sdhat[1755]=6.5 bootstrap stdev > khat[i] > FDRs and CNV 20

22 Brown Stein Robbins Estimation Suppose µ g( ) and x µ N(µ, σ 2 ) l(x) log marginal density of x µ x ( x + σ 2 l (x) ), σ 2 ( 1 + σ 2 l (x) ) Apply with µ = k i, x = ˆk i, ˆl(x) = log smoothed density {ˆki } For ˆk i = 39.1, ˆσ = 6.5, gave k 1755 (41.3, ) Conclusion Even taking account of selection effects, k 1755 is probably much larger than k = FDRs and CNV 21

23 More General Model for fdr i (z) Method 2 : Multiclass Bayes model π i0, f i0 (z), f i1 (z) with all f i0 = f 0, but drop assumption that non-null distributions f i1 (z) the same. Define: w i (z) = Pr{C i z} Empirical Bayes of C i indicator on z m. fdr i (z) fdr(z) wi(0) w i (z) Estimate w i (z) by logistic regression FDRs and CNV 22

24 zvalues for positions 1750:1759 (solid) compared to all the other positions (line) Frequency low cnv high cnv z values logistic regression estimate of wi(t)=prob{1750:1759 z} wi(z)/wi(0) z value FDRs and CNV 23

25 Three estimates of fdrhat for positions 1750:1759 fdr estimate Method 1 Method 2 combined << low cn high cn >> z value FDRs and CNV 24

26 References Efron, B. (2008). Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Statist., Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, Walther, G. (2009). Optimal and fast detection of spatial clusters with scan statistics. Online, URL gwalther/. Wang, P., Kim, Y., Pollack, J., Narasimhan, B. FDRs and CNV 25

27 and Tibshirani, R. (2005). A method for calling gains and losses in array CGH data. Biostatistics, Zhang, N., Siegmund, D., Ji, H. and Li, J. (2009). Detecting simultaneous change-points in multiple sequences. Biometrika. Accepted for publication, URL nzhang/. FDRs and CNV 26

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University Doing Thousands of Hypothesis Tests at the Same Time Bradley Efron Stanford University 1 Simultaneous Hypothesis Testing 1980: Simultaneous Statistical Inference (Rupert Miller) 2, 3,, 20 simultaneous

More information

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5 PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.

More information

Biostatistical modelling in genomics for clinical cancer studies

Biostatistical modelling in genomics for clinical cancer studies This work was supported by Entente Cordiale Cancer Research Bursaries Biostatistical modelling in genomics for clinical cancer studies Philippe Broët JE 2492 Faculté de Médecine Paris-Sud In collaboration

More information

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved

More information

Reflection Questions for Math 58B

Reflection Questions for Math 58B Reflection Questions for Math 58B Johanna Hardin Spring 2017 Chapter 1, Section 1 binomial probabilities 1. What is a p-value? 2. What is the difference between a one- and two-sided hypothesis? 3. What

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University

More information

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems. Review Probability: likelihood of an event Each possible outcome can be assigned a probability If we plotted the probabilities they would follow some type a distribution Modeling the distribution is important

More information

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin Key words : Bayesian approach, classical approach, confidence interval, estimation, randomization,

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

12.1 Inference for Linear Regression. Introduction

12.1 Inference for Linear Regression. Introduction 12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest6501 April 16, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety of data types

More information

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T. Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement

More information

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Understanding DNA Copy Number Data

Understanding DNA Copy Number Data Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Applied Statistical Analysis EDUC 6050 Week 4

Applied Statistical Analysis EDUC 6050 Week 4 Applied Statistical Analysis EDUC 6050 Week 4 Finding clarity using data Today 1. Hypothesis Testing with Z Scores (continued) 2. Chapters 6 and 7 in Book 2 Review! = $ & '! = $ & ' * ) 1. Which formula

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Comparison of two means

Comparison of two means 1 Comparison of two means Most studies are comparative in that they compare outcomes from one group with outcomes from another, for example the mean blood pressure in reponse to two different treatments.

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Working Paper Series, N. 7, July 2004 Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Luca Grassetti Department of Statistical Sciences University of Padua Italy Michela

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

The LiquidAssociation Package

The LiquidAssociation Package The LiquidAssociation Package Yen-Yi Ho October 30, 2018 1 Introduction The LiquidAssociation package provides analytical methods to study three-way interactions. It incorporates methods to examine a particular

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

HHS Public Access Author manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2017 October 01.

HHS Public Access Author manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2017 October 01. Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort Polina Binder 1, Nematollah K. Batmanghelich 2, Raul San Jose Estepar 2, and Polina Golland 1 1 Computer Science and Artificial Intelligence

More information

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20 MLE #8 Econ 674 Purdue University Justin L. Tobias (Purdue) MLE #8 1 / 20 We begin our lecture today by illustrating how the Wald, Score and Likelihood ratio tests are implemented within the context of

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Practical Experience in the Analysis of Gene Expression Data

Practical Experience in the Analysis of Gene Expression Data Workshop Biometrical Analysis of Molecular Markers, Heidelberg, 2001 Practical Experience in the Analysis of Gene Expression Data from Two Data Sets concerning ALL in Children and Patients with Nodules

More information

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

Performance Assessment for Radiologists Interpreting Screening Mammography

Performance Assessment for Radiologists Interpreting Screening Mammography Performance Assessment for Radiologists Interpreting Screening Mammography Dawn Woodard School of Operations Research and Information Engineering Cornell University Joint work with: Alan Gelfand Department

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Fundamental Clinical Trial Design

Fundamental Clinical Trial Design Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University,

More information

Using mixture priors for robust inference: application in Bayesian dose escalation trials

Using mixture priors for robust inference: application in Bayesian dose escalation trials Using mixture priors for robust inference: application in Bayesian dose escalation trials Astrid Jullion, Beat Neuenschwander, Daniel Lorand BAYES2014, London, 11 June 2014 Agenda Dose escalation in oncology

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger Conditional Distributions and the Bivariate Normal Distribution James H. Steiger Overview In this module, we have several goals: Introduce several technical terms Bivariate frequency distribution Marginal

More information

Sample Size Reestimation in Non-Inferiority Trials. Heidelberg, Germany

Sample Size Reestimation in Non-Inferiority Trials. Heidelberg, Germany Sample Size Reestimation in Non-Inferiority Trials Tim Friede 1 and Meinhard Kieser 2 1 Warwick Medical School, The University of Warwick, UK 2 Institute of Medical Biometry and Informatics, University

More information

Common Statistical Issues in Biomedical Research

Common Statistical Issues in Biomedical Research Common Statistical Issues in Biomedical Research Howard Cabral, Ph.D., M.P.H. Boston University CTSI Boston University School of Public Health Department of Biostatistics May 15, 2013 1 Overview of Basic

More information

* σ = The Z Test. Formulas and Symbols You Should Know. Assignment: Heiman Chapter 10. Terms You Should Know.

* σ = The Z Test. Formulas and Symbols You Should Know. Assignment: Heiman Chapter 10. Terms You Should Know. Assignment: Heiman Chapter 10 Terms You Should Know. Z-test Critical Value of Z when p

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation 7.1 Margins of Error and Estimates What is estimation? A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. Population Parameter Sample

More information

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Statistical Tests Using Experimental Data

Statistical Tests Using Experimental Data Statistical Tests Using Experimental Data Alec Brandon July 15, 2015 Alternative title So you ve worked your tail off and have some experimental data. Now what? Why are we even talking about statistics?

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Application of Resampling Methods in Microarray Data Analysis

Application of Resampling Methods in Microarray Data Analysis Application of Resampling Methods in Microarray Data Analysis Tests for two independent samples Oliver Hartmann, Helmut Schäfer Institut für Medizinische Biometrie und Epidemiologie Philipps-Universität

More information

Bayesian Latent Subgroup Design for Basket Trials

Bayesian Latent Subgroup Design for Basket Trials Bayesian Latent Subgroup Design for Basket Trials Yiyi Chu Department of Biostatistics The University of Texas School of Public Health July 30, 2017 Outline Introduction Bayesian latent subgroup (BLAST)

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

Quantitative Literacy: Thinking Between the Lines

Quantitative Literacy: Thinking Between the Lines Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1 Chapter 6: Statistics Lesson Plan Data summary and presentation:

More information

Confidence interval and hypothesis testing examples

Confidence interval and hypothesis testing examples Confidence interval and hypothesis testing examples Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 11/20/2018 ICU Data Data for a random sample of n = 200 patients admitted to intensive

More information

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer: Bioinformatics/Genomics of Cancer: Professor of Computer Science, Mathematics and Cell Biology Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of

More information

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum

Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum (Thanks/blame to Google Translate) Gianluca Baio University College London Department of Statistical Science g.baio@ucl.ac.uk http://www.ucl.ac.uk/statistics/research/statistics-health-economics/

More information

Estimating genetic variation within families

Estimating genetic variation within families Estimating genetic variation within families Peter M. Visscher Queensland Institute of Medical Research Brisbane, Australia peter.visscher@qimr.edu.au 1 Overview Estimation of genetic parameters Variation

More information

STAT 503X Case Study 1: Restaurant Tipping

STAT 503X Case Study 1: Restaurant Tipping STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in

More information

Module Overview. What is a Marker? Part 1 Overview

Module Overview. What is a Marker? Part 1 Overview SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

More information

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department

More information

Statistics and Probability

Statistics and Probability Statistics and a single count or measurement variable. S.ID.1: Represent data with plots on the real number line (dot plots, histograms, and box plots). S.ID.2: Use statistics appropriate to the shape

More information

Multimarker Genetic Analysis Methods for High Throughput Array Data

Multimarker Genetic Analysis Methods for High Throughput Array Data Multimarker Genetic Analysis Methods for High Throughput Array Data by Iuliana Ionita A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department

More information

MOST: detecting cancer differential gene expression

MOST: detecting cancer differential gene expression Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Institutional Ranking. VHA Study

Institutional Ranking. VHA Study Statistical Inference for Ranks of Health Care Facilities in the Presence of Ties and Near Ties Minge Xie Department of Statistics Rutgers, The State University of New Jersey Supported in part by NSF,

More information

Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort

Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort Polina Binder 1(B), Nematollah K. Batmanghelich 2, Raul San Jose Estepar 2, and Polina Golland 1 1 Computer Science and Artificial

More information

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh)

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) Harvard University Harvard University Biostatistics Working Paper Series Year 2005 Paper 30 A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) David

More information

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN Outline 1. Review sensitivity and specificity 2. Define an ROC curve 3. Define AUC 4. Non-parametric tests for whether or not the test is informative 5. Introduce the binormal ROC model 6. Discuss non-parametric

More information

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d PSYCHOLOGY 300B (A01) Assignment 3 January 4, 019 σ M = σ N z = M µ σ M d = M 1 M s p d = µ 1 µ 0 σ M = µ +σ M (z) Independent-samples t test One-sample t test n = δ δ = d n d d = µ 1 µ σ δ = d n n = δ

More information

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) EPS 625 INTERMEDIATE STATISTICS TO-AY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) A researcher conducts a study to evaluate the effects of the length of an exercise program on the flexibility of female and male

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Comparing Two ROC Curves Independent Groups Design

Comparing Two ROC Curves Independent Groups Design Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a

More information

Previously, when making inferences about the population mean,, we were assuming the following simple conditions:

Previously, when making inferences about the population mean,, we were assuming the following simple conditions: Chapter 17 Inference about a Population Mean Conditions for inference Previously, when making inferences about the population mean,, we were assuming the following simple conditions: (1) Our data (observations)

More information

Author summary. Introduction

Author summary. Introduction A Probabilistic Palimpsest Model of Visual Short-term Memory Loic Matthey 1,, Paul M Bays 2,3, Peter Dayan 1 1 Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation

A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation 7.1 Margins of Error and Estimates What is estimation? A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. Population Parameter Sample

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Comparing multiple proportions

Comparing multiple proportions Comparing multiple proportions February 24, 2017 psych10.stanford.edu Announcements / Action Items Practice and assessment problem sets will be posted today, might be after 5 PM Reminder of OH switch today

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high

What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high glycemic index diet. So I randomly assign 100 people with type

More information

PubHlth Introductory Biostatistics Practice Test I (Without Unit 3 Questions)

PubHlth Introductory Biostatistics Practice Test I (Without Unit 3 Questions) 1 PubHlth 540 - Introductory Biostatistics Practice Test I (Without Unit 3 Questions) 1. (10 points) In the Honolulu Heart Study, Systolic Blood Pressure was tabulated for 100 Subjects including 37 Smokers

More information

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE

AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE Name Date 1) True or False: In a normal distribution, the mean, median and mode all have the same value and the graph of the distribution is symmetric. 2)

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ] Iranian journal of health sciences 213;1(3):58-7 http://jhs.mazums.ac.ir Original Article Downloaded from jhs.mazums.ac.ir at 22:2 +33 on Friday October 5th 218 [ DOI: 1.18869/acadpub.jhs.1.3.58 ] A New

More information

Identifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression

Identifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression RESEARCH ARTICLE Identifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression Yong Ma 1,2 *, Yinglei Lai 1,3, John M. Lachin 1,2 1. The Biostatistics Center,

More information

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0. The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.

More information