Application of Resampling Methods in Microarray Data Analysis

Size: px
Start display at page:

Download "Application of Resampling Methods in Microarray Data Analysis"

Transcription

1 Application of Resampling Methods in Microarray Data Analysis Tests for two independent samples Oliver Hartmann, Helmut Schäfer Institut für Medizinische Biometrie und Epidemiologie Philipps-Universität Marburg

2 Folie 2 Overview Tests for two independent samples: Westfall & Young step-down resampling (Dudoit et al.) Application to different data sets estimation of false discovery rate (Tusher et al.) Application to different data sets Summary other methods proposed

3 Folie 3 Properties of Gene Expression Data lots of variables (4608) variables (genes) are highly correlated usually very small number of replicates (between 2 and 50 chips) data does not follow normal distribution (many outliers, heavy tails) (lot s of nonsense measures) For all tests: quality-filtered, standardised log-ratios from cdna arrays

4 Folie 4 One typical questions Which genes are up- or down-regulated in cancerous tissue/ between different cancer sub-types? What does the biologist want? a list with candidate genes a feeling on how good the list is They will confirm the results They will add other biological knowledge

5 Folie 5 Westfall & Young step-down Dudoit et al. (2000) based on t-statistic, adjusted p-values estimated by permutation strong control of the FWER takes into account that data is correlated assumptions: all genes have asymptotically the same null distribution (p-values are monotone in the observed t-statistics across genes)

6 Folie 6 W & Y permutation algorithm t (k) ordered by absolute value 1. permute the data (divide into fake control/ treatment group) 2. compute t-statistics for each gene j 3. compute for the k genes ( b) ( b) ( b) ( b), t ( b) 1 ( j) u = t, u = max( u k ( k ) j j+ ) with monotonic constraints! Repeat B-times and calculate: B ( b) ~ I( u t b j * 1 ( j) p( j) = = B )

7 Folie 7 Comparing W&Y to other methods Unadjusted p-values: 1. Wilcoxon rank test 2. estimation of p-value (for t-statistic) via resampling/permutation B ( b) I( t t b j j ) * 1 p j = = B Adjusted p-values: Šidák single-step adjustment ~ p = 1 (1 ) j p j k

8 Folie 8 Assumptions for Šidák assumes independency of tests! provides only weak control of FWER method does not account for correlation structure without independency assumption: ~ p 1 (1 ) j p j for very general conditions alternative (but much worse) would be Bonferroni single step Bonferroni-Holm step down k

9 Folie 9 Description of Datasets two cancer sub-types (mutated gene) 10 vs. 11 (with 2304 genes) 10 vs. 11 (with 4608 genes) 13 vs. 19 ( ) cancer vs. normal tissue 10 vs. 9 ( ) two cancer sub-types (clinical parameter) 14 vs. 18 ( )

10 Folie 10 Mutated Gene: 10 vs. 11 (2.3k)

11 Folie 11 Mutated Gene: 10 vs. 11

12 Folie 12 Mutated Gene: 13 vs. 19

13 Folie 13 Cancer vs. Normal: 10 vs. 9

14 Folie 14 Clinical Parameter: 14 vs. 18

15 Folie 15 How many permutations? Example: Mutated Gene, 13 vs. 19

16 Folie 16 False discovery rate based on Benjamini & Hochberg 1995 V / R = proportion of false discoveries R = no. all rejected null hypotheses V = no. of false rejections FDR = E(V/R) ( FWER) remember: FWER = P(V 1) FDR controlling procedure as in B&H is not applicable (needs independent test statistics) method proposed by Tusher et al. for small sample size

17 Folie 17 Estimation of the FDR Tusher et al. (2001) aim is to estimate the FDR, not to control the FDR at a given error rate does not assume independency assumes all null hypotheses are true (weak control) estimated FDR seems plausible to approximate strongly controlled FDR d-statistic: d x1 x2 = s + s, 2 2 s1 s2 s = n +, 1 n2 0 s 0 > 0

18 Folie 18 Estimation of the FDR d (i) ordered by magnitude The idea is to calculate expected d E,(i) from all possible permutations compare observed d (i) with d E,(i) and call a gene significant if d (i) > d E,(i) + Algorithm: count number of genes with d (i) > d E,(i) + for the setting of interest for all permutations of the data estimator for FDR = median of significant genes from permutations

19 Folie 19 Results from FDR estimation D Genes called sig. FDR, 2.3k, median / 90th percentile FDR, 4.6k, median / 90th percentile FDR, 4.6k, FDR, 4.6k, 10-9 FDR, 4.6k, / 32 % 0 / 0 % 0 % 0 % 0 % / 53 % 0 / 3 % 0 % 0 % 0 % / 60 % 1,1 / 6 % 0 % 0 % 0 % / 63 % 1,4 / 8 % 0,8 % 0 % 0,9 % / 89 % 5 / 16 % 2,9 % 2,8 % 3,8 %

20 Folie 20 Does the statistic matter? Comparing p-values: W & Y vs. Wilcoxon Example: Mutated Gene, 13 vs. 19

21 Folie 21 Example: Mutated Gene, 13 vs. 19 (s 0 = 0.08) Does the statistic matter? Comparing t-statistic vs. d-statistic

22 Folie 22 Summary Westfall & Young works at medium costs for moderate to large sample size better than Wilcoxon/univariate resampling + Šidák Tusher et al. the only one that works for small sample size t-statistic only good for moderate/large sample size d-statistic better for small sample size best?? looking at different statistics may be useful

23 Folie 23 Other methods proposed ANOVA (with resampling, data used without prior standardisation) Newton et al. (has been extended for multiple chips) Grant et al. (PaGE, false-positive rate) Troendle et al. (control of FDP) Other statistics: Scores (Ben-Dor et al.; Park et al.) maximum likelihood (Ideker et al.)...

Practical Experience in the Analysis of Gene Expression Data

Practical Experience in the Analysis of Gene Expression Data Workshop Biometrical Analysis of Molecular Markers, Heidelberg, 2001 Practical Experience in the Analysis of Gene Expression Data from Two Data Sets concerning ALL in Children and Patients with Nodules

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

A Review of Multiple Hypothesis Testing in Otolaryngology Literature

A Review of Multiple Hypothesis Testing in Otolaryngology Literature The Laryngoscope VC 2014 The American Laryngological, Rhinological and Otological Society, Inc. Systematic Review A Review of Multiple Hypothesis Testing in Otolaryngology Literature Erin M. Kirkham, MD,

More information

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays

Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays Mathematical Statistics Stockholm University Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays Sally Salih Examensarbete 2006:1 Postal address: Mathematical

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of SUPPLEMENTAL MATERIALS AND METHODS Quantitative RT-PCR Quantitative RT-PCR analysis was performed using the Universal mircury LNA TM microrna PCR System (Exiqon), following the manufacturer s instructions.

More information

Multiplicity Considerations in Confirmatory Subgroup Analyses

Multiplicity Considerations in Confirmatory Subgroup Analyses Multiplicity Considerations in Confirmatory Subgroup Analyses Frank Bretz European Statistical Meeting on Subgroup Analyses Brussels, November 30, 2012 Subgroup analyses Exploratory subgroup analyses are

More information

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University Doing Thousands of Hypothesis Tests at the Same Time Bradley Efron Stanford University 1 Simultaneous Hypothesis Testing 1980: Simultaneous Statistical Inference (Rupert Miller) 2, 3,, 20 simultaneous

More information

NIH Public Access Author Manuscript Best Pract Res Clin Haematol. Author manuscript; available in PMC 2010 June 1.

NIH Public Access Author Manuscript Best Pract Res Clin Haematol. Author manuscript; available in PMC 2010 June 1. NIH Public Access Author Manuscript Published in final edited form as: Best Pract Res Clin Haematol. 2009 June ; 22(2): 271 282. doi:10.1016/j.beha.2009.07.001. Analysis of DNA Microarray Expression Data

More information

Psychological Methods

Psychological Methods Psychological Methods Many Tests of Significance: New Methods for Controlling Type I Errors H. J. Keselman, Charles W. Miller, and Burt Holland Online First Publication, October 31, 2011. doi: 10.1037/a0025810

More information

VL Network Analysis ( ) SS2016 Week 3

VL Network Analysis ( ) SS2016 Week 3 VL Network Analysis (19401701) SS2016 Week 3 Based on slides by J Ruan (U Texas) Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin 1 Motivation 2 Lecture

More information

Statistics for EES Factorial analysis of variance

Statistics for EES Factorial analysis of variance Statistics for EES Factorial analysis of variance Dirk Metzler http://evol.bio.lmu.de/_statgen 1. July 2013 1 ANOVA and F-Test 2 Pairwise comparisons and multiple testing 3 Non-parametric: The Kruskal-Wallis

More information

Multilevel modelling of PMETB data on trainee satisfaction and supervision

Multilevel modelling of PMETB data on trainee satisfaction and supervision Multilevel modelling of PMETB data on trainee satisfaction and supervision Chris McManus March 2007. This final report on the PMETB trainee survey of 2006 is based on a finalised version of the SPSS data

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Sample Size Estimation for Microarray Experiments

Sample Size Estimation for Microarray Experiments Sample Size Estimation for Microarray Experiments Gregory R. Warnes Department of Biostatistics and Computational Biology Univeristy of Rochester Rochester, NY 14620 and Peng Liu Department of Biological

More information

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Package CLL. April 19, 2018

Package CLL. April 19, 2018 Type Package Title A Package for CLL Gene Expression Data Version 1.19.0 Author Elizabeth Whalen Package CLL April 19, 2018 Maintainer Robert Gentleman The CLL package contains the

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

Lecture #4: Overabundance Analysis and Class Discovery

Lecture #4: Overabundance Analysis and Class Discovery 236632 Topics in Microarray Data nalysis Winter 2004-5 November 15, 2004 Lecture #4: Overabundance nalysis and Class Discovery Lecturer: Doron Lipson Scribes: Itai Sharon & Tomer Shiran 1 Differentially

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Dynamic Outlier Algorithm Selection for Quality Improvement and Test Program Optimization

Dynamic Outlier Algorithm Selection for Quality Improvement and Test Program Optimization Dynamic Outlier Algorithm Selection for Quality Improvement and Test Program Optimization Authors: Paul Buxton Paul Tabor 5/21/04 Purpose Outliers and quality improvement Outliers and test program optimization

More information

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer 1 Intuitions about multiple testing: - Multiple tests should be more conservative than individual tests. - Controlling

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

Predication-based Bayesian network analysis of gene sets and knowledge-based SNP abstractions

Predication-based Bayesian network analysis of gene sets and knowledge-based SNP abstractions Predication-based Bayesian network analysis of gene sets and knowledge-based SNP abstractions Skanda Koppula Second Annual MIT PRIMES Conference May 20th, 2012 Mentors: Dr. Gil Alterovitz and Dr. Amin

More information

Controlling The Rate of Type I Error Over A Large Set of Statistical Tests. H. J. Keselman. University of Manitoba. Burt Holland.

Controlling The Rate of Type I Error Over A Large Set of Statistical Tests. H. J. Keselman. University of Manitoba. Burt Holland. At Least Two Type I Errors 1 Controlling The Rate of Type I Error Over A Large Set of Statistical Tests by H. J. Keselman University of Manitoba Burt Holland Temple University and Robert Cribbie University

More information

P values From Statistical Design to Analyses to Publication in the Age of Multiplicity

P values From Statistical Design to Analyses to Publication in the Age of Multiplicity P values From Statistical Design to Analyses to Publication in the Age of Multiplicity Ralph B. D Agostino, Sr. PhD Boston University Statistics in Medicine New England Journal of Medicine March 2, 2017

More information

8/28/2017. If the experiment is successful, then the model will explain more variance than it can t SS M will be greater than SS R

8/28/2017. If the experiment is successful, then the model will explain more variance than it can t SS M will be greater than SS R PSY 5101: Advanced Statistics for Psychological and Behavioral Research 1 If the ANOVA is significant, then it means that there is some difference, somewhere but it does not tell you which means are different

More information

Mass univariate analysis of event-related brain potentials/fields II: Simulation studies

Mass univariate analysis of event-related brain potentials/fields II: Simulation studies Psychophysiology, ]]] (2), 2. Wiley Periodicals, Inc. Printed in the USA. Copyright r 2 Society for Psychophysiological Research DOI:./j.469-8986.2.272.x Mass univariate analysis of event-related brain

More information

Mass univariate analysis of event-related brain potentials/fields II: Simulation studies

Mass univariate analysis of event-related brain potentials/fields II: Simulation studies (BWUS PSYP Webpdf:=// :: Bytes PAGES n operator=n.bhuvaneswari) // :: PM PSYP B Dispatch:.. Journal: PSYP CE: Blackwell Journal Name Manuscript No. Author Received: No. of pages: PG: Bhuvaneswari Psychophysiology,

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Bayesian Multiplicity Control

Bayesian Multiplicity Control Bayesian Multiplicity Control Jim Berger Duke University B.G. Greenberg Distinguished Lectures Department of Biostatistics University of North Carolina at Chapel Hill May 13, 2016 1 Outline I. Introduction

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Package ORIClust. February 19, 2015

Package ORIClust. February 19, 2015 Type Package Package ORIClust February 19, 2015 Title Order-restricted Information Criterion-based Clustering Algorithm Version 1.0-1 Date 2009-09-10 Author Maintainer Tianqing Liu

More information

Comparative efficacy or effectiveness studies frequently

Comparative efficacy or effectiveness studies frequently Economics, Education, and Policy Section Editor: Franklin Dexter STATISTICAL GRAND ROUNDS Joint Hypothesis Testing and Gatekeeping Procedures for Studies with Multiple Endpoints Edward J. Mascha, PhD,*

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

MOST: detecting cancer differential gene expression

MOST: detecting cancer differential gene expression Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical

More information

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Experimental Design For Microarray Experiments Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Copyright 2002 Complexity of Genomic data the functioning of cells is a complex and highly

More information

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,...

More information

DeSigN: connecting gene expression with therapeutics for drug repurposing and development. Bernard lee GIW 2016, Shanghai 8 October 2016

DeSigN: connecting gene expression with therapeutics for drug repurposing and development. Bernard lee GIW 2016, Shanghai 8 October 2016 DeSigN: connecting gene expression with therapeutics for drug repurposing and development Bernard lee GIW 2016, Shanghai 8 October 2016 1 Motivation Average cost: USD 1.8 to 2.6 billion ~2% Attrition rate

More information

Experimental Methods. Anna Fahlgren, Phd Associate professor in Experimental Orthopaedics

Experimental Methods. Anna Fahlgren, Phd Associate professor in Experimental Orthopaedics Experimental Methods Anna Fahlgren, Phd Associate professor in Experimental Orthopaedics What is experimental Methods? Experimental Methdology Experimental Methdology The Formal Hypothesis The precise

More information

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer) Project Dilemmas How do I know when I m done? How do I know what I ve accomplished? clearly define focus/goal from beginning design a search method that handles plateaus improve some ML method s robustness

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Statistical power and significance testing in large-scale genetic studies

Statistical power and significance testing in large-scale genetic studies STUDY DESIGNS Statistical power and significance testing in large-scale genetic studies Pak C. Sham 1 and Shaun M. Purcell 2,3 Abstract Significance testing was developed as an objective method for summarizing

More information

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC)

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC) Regulatory Guidance Regulatory guidance provided in 40 CFR 257.90 specifies that a CCR groundwater monitoring program must include selection of the statistical procedures to be used for evaluating groundwater

More information

Adaptive Treatment Arm Selection in Multivariate Bioequivalence Trials

Adaptive Treatment Arm Selection in Multivariate Bioequivalence Trials Adaptive Treatment Arm Selection in Multivariate Bioequivalence Trials June 25th 215 Tobias Mielke ICON Innovation Center Acknowledgments / References Presented theory based on methodological work regarding

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

Introduction to Gene Sets Analysis

Introduction to Gene Sets Analysis Introduction to Svitlana Tyekucheva Dana-Farber Cancer Institute May 15, 2012 Introduction Various measurements: gene expression, copy number variation, methylation status, mutation profile, etc. Main

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013 Evidence-Based Medicine Journal Club A Primer in Statistics, Study Design, and Epidemiology August, 2013 Rationale for EBM Conscientious, explicit, and judicious use Beyond clinical experience and physiologic

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Midterm Exam MMI 409 Spring 2009 Gordon Bleil Midterm Exam MMI 409 Spring 2009 Gordon Bleil Table of contents: (Hyperlinked to problem sections) Problem 1 Hypothesis Tests Results Inferences Problem 2 Hypothesis Tests Results Inferences Problem 3

More information

Discovering Significant Patterns

Discovering Significant Patterns Discovering Significant Patterns Geoffrey I. Webb Faculty of Information Technology PO Box 75, Monash University Clayton, Vic., 8, Australia Tel: 6 99 596 Fax: 6 99 556 Email: webb@infotech.monash.edu

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2007 Paper 221 Biomarker Discovery Using Targeted Maximum Likelihood Estimation: Application to the

More information

A GUIDE TO ROBUST STATISTICAL METHODS IN NEUROSCIENCE. Keywords: Non-normality, heteroscedasticity, skewed distributions, outliers, curvature.

A GUIDE TO ROBUST STATISTICAL METHODS IN NEUROSCIENCE. Keywords: Non-normality, heteroscedasticity, skewed distributions, outliers, curvature. A GUIDE TO ROBUST STATISTICAL METHODS IN NEUROSCIENCE Authors: Rand R. Wilcox 1, Guillaume A. Rousselet 2 1. Dept. of Psychology, University of Southern California, Los Angeles, CA 90089-1061, USA 2. Institute

More information

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) v. 05/03/2016 Summary The purpose of this document is to describe the protein reports generated as part of the CPTAC Common Data Analysis Pipeline

More information

Digitizing the Proteomes From Big Tissue Biobanks

Digitizing the Proteomes From Big Tissue Biobanks Digitizing the Proteomes From Big Tissue Biobanks Analyzing 24 Proteomes Per Day by Microflow SWATH Acquisition and Spectronaut Pulsar Analysis Jan Muntel 1, Nick Morrice 2, Roland M. Bruderer 1, Lukas

More information

This is a repository copy of Recommendations on multiple testing adjustment in multi-arm trials with a shared control group.

This is a repository copy of Recommendations on multiple testing adjustment in multi-arm trials with a shared control group. This is a repository copy of Recommendations on multiple testing adjustment in multi-arm trials with a shared control group. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/103025/

More information

Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review

Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review 0 0 0 0 (BWUS Webpdf:=0//0 0:: 0 Bytes PAGES n operator=n.bhuvaneswari) //0 :: PM B Dispatch:.. Journal: PSYP CE: Blackwell Journal Name Manuscript No. Author Received: No. of pages: PE: Bhuvaneswari Psychophysiology,

More information

Using Split Samples to Improve Inference on Causal Effects

Using Split Samples to Improve Inference on Causal Effects Using Split Samples to Improve Inference on Causal Effects Marcel Fafchamps and Julien Labonne April 2017 Abstract We discuss a statistical procedure to carry out empirical research that combines recent

More information

Behavioral Data Mining. Lecture 4 Measurement

Behavioral Data Mining. Lecture 4 Measurement Behavioral Data Mining Lecture 4 Measurement Outline Hypothesis testing Parametric statistical tests Non-parametric tests Precision-Recall plots ROC plots Hardware update Icluster machines are ready for

More information

Package AbsFilterGSEA

Package AbsFilterGSEA Type Package Package AbsFilterGSEA September 21, 2017 Title Improved False Positive Control of Gene-Permuting GSEA with Absolute Filtering Version 1.5.1 Author Sora Yoon Maintainer

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Research Methods and Ethics in Psychology Week 4 Analysis of Variance (ANOVA) One Way Independent Groups ANOVA Brief revision of some important concepts To introduce the concept of familywise error rate.

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA 15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Degree Title. Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation

Degree Title. Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation Master s Degree in Cognitive Science Degree Title Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation Tutor Nathan Weisz, Emanuele Olivetti, Paolo Avesani Student Seyed Mostafa Kia Academic

More information

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm: The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Regroup the pair

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance?

The t-test: Answers the question: is the difference between the two conditions in my experiment real or due to chance? The t-test: Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) Dependent-means t-test: ( Matched-pairs" or "one-sample" t-test).

More information

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin Key words : Bayesian approach, classical approach, confidence interval, estimation, randomization,

More information

Hierarchy of Statistical Goals

Hierarchy of Statistical Goals Hierarchy of Statistical Goals Ideal goal of scientific study: Deterministic results Determine the exact value of a ment or population parameter Prediction: What will the value of a future observation

More information

SPOTTING THEM AND AVOIDING THEM COMMON MISTAKES IN STATISTICS

SPOTTING THEM AND AVOIDING THEM COMMON MISTAKES IN STATISTICS NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM Day 4: Common Mistakes Based on Common Misunderstandings about Statistical Inference MAY 22 25,

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Practical Statistical Reasoning in Clinical Trials

Practical Statistical Reasoning in Clinical Trials Seminar Series to Health Scientists on Statistical Concepts 2011-2012 Practical Statistical Reasoning in Clinical Trials Paul Wakim, PhD Center for the National Institute on Drug Abuse 10 January 2012

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Basic Statistics for Comparing the Centers of Continuous Data From Two Groups

Basic Statistics for Comparing the Centers of Continuous Data From Two Groups STATS CONSULTANT Basic Statistics for Comparing the Centers of Continuous Data From Two Groups Matt Hall, PhD, Troy Richardson, PhD Comparing continuous data across groups is paramount in research and

More information

REACTIN: Regulatory activity inference of transcription factors underlying human diseases with application to breast cancer

REACTIN: Regulatory activity inference of transcription factors underlying human diseases with application to breast cancer Zhu et al. BMC Genomics 2013, 14:504 METHODOLOGY ARTICLE Open Access REACTIN: Regulatory activity inference of transcription factors underlying human diseases with application to breast cancer Mingzhu

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

Empirical Bayes Identication of Tumor Progression Genes from Microarray Data

Empirical Bayes Identication of Tumor Progression Genes from Microarray Data 68 Biometrical Journal 49 (2007) 1, 68 77 DOI: 10.1002/bimj.200610312 Empirical Bayes Identication of Tumor Progression Genes from Microarray Data Debashis Ghosh *,1 and Arul M. Chinnaiyan 2 1 Department

More information

Strength of functional signature correlates with effect size in autism

Strength of functional signature correlates with effect size in autism Ballouz and Gillis Genome Medicine (217) 9:64 DOI 1.1186/s1373-17-455-8 RESEARCH Open Access Strength of functional signature correlates with effect size in autism Sara Ballouz and Jesse Gillis * Abstract

More information

What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high

What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high glycemic index diet. So I randomly assign 100 people with type

More information

Best (but oft-forgotten) practices: the multiple problems of multiplicity whether and how to correct for many statistical tests 1

Best (but oft-forgotten) practices: the multiple problems of multiplicity whether and how to correct for many statistical tests 1 AJCN. First published ahead of print August 5, 2015 as doi: 10.3945/ajcn.115.113548. Statistical Commentary Best (but oft-forgotten) practices: the multiple problems of multiplicity whether and how to

More information

Package CancerMutationAnalysis

Package CancerMutationAnalysis Type Package Package CancerMutationAnalysis Title Cancer mutation analysis Version 1.2.1 Author Giovanni Parmigiani, Simina M. Boca March 25, 2013 Maintainer Simina M. Boca Imports

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

Inferential Statistics

Inferential Statistics Inferential Statistics and t - tests ScWk 242 Session 9 Slides Inferential Statistics Ø Inferential statistics are used to test hypotheses about the relationship between the independent and the dependent

More information

Machine Learning for Personalized Medicine

Machine Learning for Personalized Medicine Department Biosystems Machine Learning for Personalized Medicine Karsten Borgwardt ETH Zürich, Department Biosystems Biozentrum Basel, April 25, 2016 Why do we need Machine Learning in Systems Biology

More information