Practical Experience in the Analysis of Gene Expression Data

Size: px
Start display at page:

Download "Practical Experience in the Analysis of Gene Expression Data"

Transcription

1 Workshop Biometrical Analysis of Molecular Markers, Heidelberg, 2001 Practical Experience in the Analysis of Gene Expression Data from Two Data Sets concerning ALL in Children and Patients with Nodules in the Thyroid Glands S. Kropf, Otto von Guericke University, Magdeburg O. Kuß, U. Hattenhorst, S. Burdach, Martin Luther University, Halle M. Eszlinger, K. Krohn, University of Leipzig

2 Contents Introduction Methods applied Data sets Results Summary Kropf et al., Heidelberg, November

3 1. Introduction One of the basic questions in the analysis of gene expression data: Detection of genes that are more or less expressed in some cells than in others: e.g., patients with some disease vs. healthy persons (two-sample problem) affected tissue vs. unaffected ones in the same persons (one-sample problem) Kropf et al., Heidelberg, November

4 Statistical concern: Problem of multiple testing: many hundreds or thousands of genes considered in parallel. In literature and own proposals: methods to control the experimentwise (familywise) type one error. Own experience was based on small samples of lowdimensional arrays (Macro arrays with 588 Genes, each two-fold spotted). Question: Does that also work with high-dimensional arrays? Kropf et al., Heidelberg, November

5 More detailed: Is it statistically feasible two keep the experimentwise error rate? (Can we find significant variables or is the claim to high to be fulfilled with higher dimensions?) Is it technically feasible? on a standard PC? with standard statistical software (SAS, SPSS)? Are there alternative techniques or claims? What about data transformations? Kropf et al., Heidelberg, November

6 2. Methods applied Methods to control the experimentwise type I error: a) Bonferroni-(Holm-)Method α - adjustment: α / (number of genes i) i = 0,1,... per test easy to implement small effect of Holm correction extremely small effective type I errors robustness questionable in extreme tails Kropf et al., Heidelberg, November

7 b) Permutation procedure of Westfall and Young (Westfall and Young, 1993; Dudoit et al., 2000) Basic tests (here t tests) are first carried out with data in the original data, then repeatedly with permuted samples. Each variable is not compared with itself in the permutation process, but the t value of the most significant original variable is compared to that of the most significant variable in the bth permutation sample, the t value of the second original variable (ordered by significance) is compared to the maximum of that for all variables except that which was originally most significant,... Kropf et al., Heidelberg, November

8 Comments: Procedure is distribution-free because of permutation principle. It uses a parametric test as basic element - insofar power dependent on distribution. Procedure is implemented in the SAS procedure PROC MULTTEST for the comparison of independent samples. It is recommended by the working group of Terry Speed (University of California, Berkeley). Kropf et al., Heidelberg, November

9 c) Parametric sequential procedure (Kropf, 2000), two-sample case Based on the assumption of equal variances in all variables (genes). 1. Sort all variables for decreasing values of the variance in the total sample. 2. Carry out unadjusted two-sample t tests in the order given by step 1 until the first non-significant test. Controls the experimentwise Type I error if all variables are normally distributed with equal variance in both groups. The additional assumption of equal variances between all variables is not necessary for the type I error, but for a good power. One-sample version similar (sort for decreasing values of the mean squared difference from 0). Kropf et al., Heidelberg, November

10 d) Nonparametric sequential procedure, two-sample case Also dependent on the assumption of equal variance (shape, interquartile range). 1. Sort variables for decreasing values of the interquartile range of the total sample. 2. Carry out Mann-Whitney-U-Tests in that order until the first non-significant test. Utilizes independence of rank and order statistics. Equal variance of variables important for power, not for type I error. Insofar dependent on distribution. Kropf et al., Heidelberg, November

11 3. Data sets A: acute lymphatic leukemia (ALL) in children (U. Hattenhorst, S. Burdach, O. Kuß, Halle) 11 children with ALL vs. 10 healthy persons, special Affymetrix (R) chip with genes, selection of genes after dropping all rows with empty description field or with a description starting with EST, skewed distributions (across genes or patients), also negative expression values - due to back-ground elimination and normalization process. Kropf et al., Heidelberg, November

12 ALL patients: distribution of expression levels across genes ALL02 ALL01 ALL15 ALL19 ALL22 ALL30 ALL31 ALL32 ALL25 ALL28 ALL29 percentiles Kropf et al., Heidelberg, November

13 Two versions of a transformation: 1. logarithms Problem: negative values shifted: ln(x + 200) 2. cubic root Variances of expression values not really equal for different genes in both versions. Genes with small variances will be supressed in sequential procedure. G G G G G G G group controls ALL mean stddev mean stddev E E selection of seven genes with logarithmic transformation Kropf et al., Heidelberg, November

14 B: patients with nodules in the thyroid glands (Krohn, Eszlinger, Leipzig) 5 patients with hot nodules 5 patients with cold nodules each patient tissue from nodule and from unaffected surrounding genes per tissue two versions of expression levels provided by the Affymetrix software: o LogAvg (pre-processing at logarithmic level) o AvgDiff ( usual version problems with negative values and skewed distributions) cubic root transformation Kropf et al., Heidelberg, November

15 Selection of 10 genes (logarithmic values) var001 var002 var003 var004 var005 var006 var007 var008 var009 var010 group hot nodules cold nodules mean stddev mean stddev E E E Variances not too different! Kropf et al., Heidelberg, November

16 The same genes with cubic root transformation VAR001 VAR002 VAR003 VAR004 VAR005 VAR006 VAR007 VAR008 VAR009 VAR010 group hot nodules cold nodules mean stddev mean stddev E Variances more heterogeneous! Kropf et al., Heidelberg, November

17 4. Results A: ALL data Full set of genes (p = , n 1 = 10, n 2 = 11, α = 0.05): Parametric sequential procedure: Matrix language of SPSS, a few minutes processing time on Pentium III PC, no special problems. #local sign. #Bonf. sign. #sequ.sign. logarithmic transf cubic root (of the 11 above) Westfall/Young: SAS can treat variables at most. Kropf et al., Heidelberg, November

18 Reduced set of genes (p = 9.805) all versions based on logarithmic values: # local sign # Bonferroni sign. 7 # parametric sequ. 10 (2 minutes) # nonparametric sequ. 7 1) # Westfall/Young 12 (30 minutes) 1) with standard SPSS procedures borderline load with large pivot tables Kropf et al., Heidelberg, November

19 Nonparametric sequential procedure for restricted ALL data G(1) G(2) G(3) G(4) G(5) G(6) G(7) G(8) G(9) G(10) G(11) G(12) Monte-Carlo-Sig Mannnifikanz(2-seitig) Whitney -U Signifikanz CASE_LBL GROUP G(1) BM BM BM TS TS CB CB CB CB CB ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL ALL Kropf et al., Heidelberg, November

20 B: hot and cold nodules in thyroid glands ( p = ) both types of nodules together vs. surrounding (n = 5 + 5) one-sample problem, Westfall/Young not available in SAS logarithmic data (Affymetrix) cubic root # local sign # Bonf # parametric sequ. 4 4 (1 in common) # nonparametric sequ. 6 Kropf et al., Heidelberg, November

21 b) hot vs. cold nodules (n 1 = n 2 = 5) a lot of local significances (1.465 with ln, with root, both more than 10 % of genes), but nothing at familywise error level (sequential, Holm, Westfall-Young) Again question: is that claim of familywise error level to hard? Kropf et al., Heidelberg, November

22 Alternative claim: False discovery rate (Benjamini/Hochberg, 1995): expected proportion of falsely rejected null hypotheses among all rejected hypotheses. Recently discussed for array analyses (Tusher, Tibshirani, Chu, 2001, Dudoit et al., ISCB, 2001). Advantage: claim is not higher in high-dimensional arrays than in low-dimensional ones. Problems: In a situation with many non-null genes we can hide also a lot of falsely detected genes. 2. Can it be controlled properly? Benjamini and Hochberg s proposal treats only uncorrelated tests. Kropf et al., Heidelberg, November

23 SAS help text: PROC MULTTEST: FDR Option The FDR option requests adjusted p-values using the method of Benjamini and Hochberg. These p-values do not control the familywise error rate, but they do control the false discovery rate in some cases. ALL data (ln): genes, local sign., 7 Holm, 10/7 sequ., 12 Westfall/Young FDR 398 hot/cold nodules (ln): genes, local, 0 familywise, 3 FDR Kropf et al., Heidelberg, November

24 5. Summary The experimentwise error can be controlled even in extreme high dimensions, but only few significant variables will usually be found; dependent on procedure and transformations. Alternative way: FDR seems sensible. We should give several versions of significance. The combination of standard PC / statistical standard software is on borderline with such high dimensions. There are hard and soft restrictions. Raw data should be transformed, but special way may be dependent on type of arrays and others. Kropf et al., Heidelberg, November

25 References: Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J.R.Statist. Soc. B 57, Dudoit, S, Yang, Y.H., Callow, M.J., Speed, T.P. (2000). Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Technical Report # 578, Stanford University School of Medicine. Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, Kropf, S. (2000). Hochdimensionale multivariate Verfahren in der medizinischen Statistik. Shaker Verlag, Aachen. Tusher, V.G., Tibshirani, R., Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98, Westfall, P.H. and Young, S.S. (1993). Resampling-based multiple testing. John Wiley & Sons, New York. Kropf et al., Heidelberg, November

Application of Resampling Methods in Microarray Data Analysis

Application of Resampling Methods in Microarray Data Analysis Application of Resampling Methods in Microarray Data Analysis Tests for two independent samples Oliver Hartmann, Helmut Schäfer Institut für Medizinische Biometrie und Epidemiologie Philipps-Universität

More information

On testing dependency for data in multidimensional contingency tables

On testing dependency for data in multidimensional contingency tables On testing dependency for data in multidimensional contingency tables Dominika Polko 1 Abstract Multidimensional data analysis has a very important place in statistical research. The paper considers the

More information

A Review of Multiple Hypothesis Testing in Otolaryngology Literature

A Review of Multiple Hypothesis Testing in Otolaryngology Literature The Laryngoscope VC 2014 The American Laryngological, Rhinological and Otological Society, Inc. Systematic Review A Review of Multiple Hypothesis Testing in Otolaryngology Literature Erin M. Kirkham, MD,

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Using SAS to Calculate Tests of Cliff s Delta. Kristine Y. Hogarty and Jeffrey D. Kromrey

Using SAS to Calculate Tests of Cliff s Delta. Kristine Y. Hogarty and Jeffrey D. Kromrey Using SAS to Calculate Tests of Cliff s Delta Kristine Y. Hogarty and Jeffrey D. Kromrey Department of Educational Measurement and Research, University of South Florida ABSTRACT This paper discusses a

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.

Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin Key words : Bayesian approach, classical approach, confidence interval, estimation, randomization,

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box Assignment #6 Chapter 10: 14, 15 Chapter 11: 14, 18 Due tomorrow Nov. 6 th by 2pm in your TA s homework box Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 13 th by 2pm in your TA

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,...

More information

Sample Size Estimation for Microarray Experiments

Sample Size Estimation for Microarray Experiments Sample Size Estimation for Microarray Experiments Gregory R. Warnes Department of Biostatistics and Computational Biology Univeristy of Rochester Rochester, NY 14620 and Peng Liu Department of Biological

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Research Methods and Ethics in Psychology Week 4 Analysis of Variance (ANOVA) One Way Independent Groups ANOVA Brief revision of some important concepts To introduce the concept of familywise error rate.

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Voxel-based Lesion-Symptom Mapping. Céline R. Gillebert

Voxel-based Lesion-Symptom Mapping. Céline R. Gillebert Voxel-based Lesion-Symptom Mapping Céline R. Gillebert Paul Broca (1861) Mr. Tan no productive speech single repetitive syllable tan Broca s area: speech production Broca s aphasia: problems with fluency,

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

The Research Roadmap Checklist

The Research Roadmap Checklist 1/5 The Research Roadmap Checklist Version: December 1, 2007 All enquires to bwhitworth@acm.org This checklist is at http://brianwhitworth.com/researchchecklist.pdf The element details are explained at

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Type I Error Of Four Pairwise Mean Comparison Procedures Conducted As Protected And Unprotected Tests

Type I Error Of Four Pairwise Mean Comparison Procedures Conducted As Protected And Unprotected Tests Journal of odern Applied Statistical ethods Volume 4 Issue 2 Article 1 11-1-25 Type I Error Of Four Pairwise ean Comparison Procedures Conducted As Protected And Unprotected Tests J. Jackson Barnette University

More information

PSY 216: Elementary Statistics Exam 4

PSY 216: Elementary Statistics Exam 4 Name: PSY 16: Elementary Statistics Exam 4 This exam consists of multiple-choice questions and essay / problem questions. For each multiple-choice question, circle the one letter that corresponds to the

More information

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors Chapter 9 Factorial ANOVA with Two Between-Group Factors 10/22/2001 1 Factorial ANOVA with Two Between-Group Factors Recall that in one-way ANOVA we study the relation between one criterion variable and

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

EMA Workshop on Multiplicity Issues in Clinical Trials 16 November 2012, EMA, London, UK

EMA Workshop on Multiplicity Issues in Clinical Trials 16 November 2012, EMA, London, UK EMA Workshop on Multiplicity Issues in Clinical Trials 16 November 2012, EMA, London, UK (http://www.ema.europa.eu/ema/index.jsp?curl=pages/news_and_events/events/2012/06/event_detai l_000589.jsp). Summary

More information

Introduction to Gene Sets Analysis

Introduction to Gene Sets Analysis Introduction to Svitlana Tyekucheva Dana-Farber Cancer Institute May 15, 2012 Introduction Various measurements: gene expression, copy number variation, methylation status, mutation profile, etc. Main

More information

Package CLL. April 19, 2018

Package CLL. April 19, 2018 Type Package Title A Package for CLL Gene Expression Data Version 1.19.0 Author Elizabeth Whalen Package CLL April 19, 2018 Maintainer Robert Gentleman The CLL package contains the

More information

The Role of CD164 in Metastatic Cancer Aaron M. Havens J. Wang, Y-X. Sun, G. Heresi, R.S. Taichman Mentor: Russell Taichman

The Role of CD164 in Metastatic Cancer Aaron M. Havens J. Wang, Y-X. Sun, G. Heresi, R.S. Taichman Mentor: Russell Taichman The Role of CD164 in Metastatic Cancer Aaron M. Havens J. Wang, Y-X. Sun, G. Heresi, R.S. Taichman Mentor: Russell Taichman The spread of tumors, a process called metastasis, is a dreaded complication

More information

Power of a Clinical Study

Power of a Clinical Study Power of a Clinical Study M.Yusuf Celik 1, Editor-in-Chief 1 Prof.Dr. Biruni University, Medical Faculty, Dept of Biostatistics, Topkapi, Istanbul. Abstract The probability of not committing a Type II

More information

Power of the test of One-Way Anova after transforming with large sample size data

Power of the test of One-Way Anova after transforming with large sample size data Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 9 (2010) 933 937 WCLTA-2010 Power of the test of One-Way Anova after transforming with large sample size data Natcha Mahapoonyanont

More information

f WILEY ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Staffordshire, United Kingdom Keele University School of Psychology

f WILEY ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Staffordshire, United Kingdom Keele University School of Psychology ANOVA and ANCOVA A GLM Approach Second Edition ANDREW RUTHERFORD Keele University School of Psychology Staffordshire, United Kingdom f WILEY A JOHN WILEY & SONS, INC., PUBLICATION Contents Acknowledgments

More information

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential

More information

POLS 5377 Scope & Method of Political Science. Correlation within SPSS. Key Questions: How to compute and interpret the following measures in SPSS

POLS 5377 Scope & Method of Political Science. Correlation within SPSS. Key Questions: How to compute and interpret the following measures in SPSS POLS 5377 Scope & Method of Political Science Week 15 Measure of Association - 2 Correlation within SPSS 2 Key Questions: How to compute and interpret the following measures in SPSS Ordinal Variable Gamma

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York Cross-over trials Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk Cross-over trials Use the participant as their own control. Each participant gets more than one

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 9 2005, pages 1979 1986 doi:10.1093/bioinformatics/bti294 Gene expression Estimating misclassification error with small samples via bootstrap cross-validation

More information

Confidence Intervals On Subsets May Be Misleading

Confidence Intervals On Subsets May Be Misleading Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu

More information

Controlling The Rate of Type I Error Over A Large Set of Statistical Tests. H. J. Keselman. University of Manitoba. Burt Holland.

Controlling The Rate of Type I Error Over A Large Set of Statistical Tests. H. J. Keselman. University of Manitoba. Burt Holland. At Least Two Type I Errors 1 Controlling The Rate of Type I Error Over A Large Set of Statistical Tests by H. J. Keselman University of Manitoba Burt Holland Temple University and Robert Cribbie University

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

Paired samples CFA for the multivariate detection of change in small samples

Paired samples CFA for the multivariate detection of change in small samples Psychology Science, Volume 47, 2005 (3/4), p. 440-446 Paired samples CFA for the multivariate detection of change in small samples MARCUS ISING 1 & WILHELM JANKE 2 Abstract Paired samples Configuration

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of SUPPLEMENTAL MATERIALS AND METHODS Quantitative RT-PCR Quantitative RT-PCR analysis was performed using the Universal mircury LNA TM microrna PCR System (Exiqon), following the manufacturer s instructions.

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer

Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer Undesirable Optimality Results in Multiple Testing? Charles Lewis Dorothy T. Thayer 1 Intuitions about multiple testing: - Multiple tests should be more conservative than individual tests. - Controlling

More information

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University

Doing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University Doing Thousands of Hypothesis Tests at the Same Time Bradley Efron Stanford University 1 Simultaneous Hypothesis Testing 1980: Simultaneous Statistical Inference (Rupert Miller) 2, 3,, 20 simultaneous

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

Analysis of Variance: repeated measures

Analysis of Variance: repeated measures Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric

More information

Effect of Source and Level of Protein on Weight Gain of Rats

Effect of Source and Level of Protein on Weight Gain of Rats Effect of Source and Level of Protein on of Rats 1 * two factor analysis of variance with interaction; 2 option ls=120 ps=75 nocenter nodate; 3 4 title Effect of Source of Protein and Level of Protein

More information

The update of the multiplicity guideline

The update of the multiplicity guideline The update of the multiplicity guideline Norbert Benda and Medical Devices (BfArM), Bonn Disclaimer: Views expressed in this presentation are the author's personal views and not necessarily the views of

More information

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce Stats 95 Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce pure gold knowing it was an element, though

More information

IN SPITE of a very quick development of medicine within

IN SPITE of a very quick development of medicine within INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 21, VOL. 6, NO. 3, PP. 281-286 Manuscript received July 1, 21: revised September, 21. DOI: 1.2478/v1177-1-37-9 Application of Density Based Clustering

More information

Power & Sample Size. Dr. Andrea Benedetti

Power & Sample Size. Dr. Andrea Benedetti Power & Sample Size Dr. Andrea Benedetti Plan Review of hypothesis testing Power and sample size Basic concepts Formulae for common study designs Using the software When should you think about power &

More information

AMSc Research Methods Research approach IV: Experimental [2]

AMSc Research Methods Research approach IV: Experimental [2] AMSc Research Methods Research approach IV: Experimental [2] Marie-Luce Bourguet mlb@dcs.qmul.ac.uk Statistical Analysis 1 Statistical Analysis Descriptive Statistics : A set of statistical procedures

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY

CLINICAL RESEARCH METHODS VISP356. MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY DIVISION OF VISION SCIENCES SESSION: 2006/2007 DIET: 1ST CLINICAL RESEARCH METHODS VISP356 LEVEL: MODULE LEADER: PROF A TOMLINSON B.Sc./B.Sc.(HONS) OPTOMETRY MAY 2007 DURATION: 2 HRS CANDIDATES SHOULD

More information

Behavioral Data Mining. Lecture 4 Measurement

Behavioral Data Mining. Lecture 4 Measurement Behavioral Data Mining Lecture 4 Measurement Outline Hypothesis testing Parametric statistical tests Non-parametric tests Precision-Recall plots ROC plots Hardware update Icluster machines are ready for

More information

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results. 10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related

More information

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups Survey Methods & Design in Psychology Lecture 10 ANOVA (2007) Lecturer: James Neill Overview of Lecture Testing mean differences ANOVA models Interactions Follow-up tests Effect sizes Parametric Tests

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

SEM: the precision of the mean of the sample in predicting the population parameter and is a way of relating the sample mean to population mean

SEM: the precision of the mean of the sample in predicting the population parameter and is a way of relating the sample mean to population mean 1999b(9)/1997a(14)/1995b(17): What is meant by 95% confidence interval? Explain the practical applications of CIs and indicate why they may be preferred to P values General: 95% CI defines the range of

More information

A MONTE CARLO SIMULATION STUDY FOR COMPARING PERFORMANCES OF SOME HOMOGENEITY OF VARIANCES TESTS

A MONTE CARLO SIMULATION STUDY FOR COMPARING PERFORMANCES OF SOME HOMOGENEITY OF VARIANCES TESTS A MONTE CARLO SIMULATION STUDY FOR COMPARING PERFORMANCES OF SOME HOMOGENEITY OF VARIANCES TESTS Hamit MIRTAGIOĞLU Bitlis Eren University, Faculty of Arts and Sciences, Department of Statistics, Bitlis-Turkey

More information

Research Manual COMPLETE MANUAL. By: Curtis Lauterbach 3/7/13

Research Manual COMPLETE MANUAL. By: Curtis Lauterbach 3/7/13 Research Manual COMPLETE MANUAL By: Curtis Lauterbach 3/7/13 TABLE OF CONTENTS INTRODUCTION 1 RESEARCH DESIGN 1 Validity 1 Reliability 1 Within Subjects 1 Between Subjects 1 Counterbalancing 1 Table 1.

More information

Research Analysis MICHAEL BERNSTEIN CS 376

Research Analysis MICHAEL BERNSTEIN CS 376 Research Analysis MICHAEL BERNSTEIN CS 376 Last time What is a statistical test? Chi-square t-test Paired t-test 2 Today ANOVA Posthoc tests Two-way ANOVA Repeated measures ANOVA 3 Recall: hypothesis testing

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Lab 7 (100 pts.): One-Way ANOVA Objectives: Analyze data via the One-Way ANOVA

Lab 7 (100 pts.): One-Way ANOVA Objectives: Analyze data via the One-Way ANOVA STAT 350 (Spring 2015) Lab 7: SAS Solution 1 Lab 7 (100 pts.): One-Way ANOVA Objectives: Analyze data via the One-Way ANOVA A. (50 pts.) Do isoflavones increase bone mineral density? (ex12-45bmd.txt) Kudzu

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics

More information

Statistics for EES Factorial analysis of variance

Statistics for EES Factorial analysis of variance Statistics for EES Factorial analysis of variance Dirk Metzler http://evol.bio.lmu.de/_statgen 1. July 2013 1 ANOVA and F-Test 2 Pairwise comparisons and multiple testing 3 Non-parametric: The Kruskal-Wallis

More information

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015 Sampling for Impact Evaluation Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015 How many hours do you expect to sleep tonight? A. 2 or less B. 3 C.

More information

MOST: detecting cancer differential gene expression

MOST: detecting cancer differential gene expression Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical

More information

Readings Assumed knowledge

Readings Assumed knowledge 3 N = 59 EDUCAT 59 TEACHG 59 CAMP US 59 SOCIAL Analysis of Variance 95% CI Lecture 9 Survey Research & Design in Psychology James Neill, 2012 Readings Assumed knowledge Howell (2010): Ch3 The Normal Distribution

More information

Comparing multiple proportions

Comparing multiple proportions Comparing multiple proportions February 24, 2017 psych10.stanford.edu Announcements / Action Items Practice and assessment problem sets will be posted today, might be after 5 PM Reminder of OH switch today

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University

More information

Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays

Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays Mathematical Statistics Stockholm University Application of the concept of False Discovery Rate on predicted cancer outcome with microarrays Sally Salih Examensarbete 2006:1 Postal address: Mathematical

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer) Project Dilemmas How do I know when I m done? How do I know what I ve accomplished? clearly define focus/goal from beginning design a search method that handles plateaus improve some ML method s robustness

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About

7 Statistical Issues that Researchers Shouldn t Worry (So Much) About 7 Statistical Issues that Researchers Shouldn t Worry (So Much) About By Karen Grace-Martin Founder & President About the Author Karen Grace-Martin is the founder and president of The Analysis Factor.

More information

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Kipling Will- 1 March Data/Hypothesis Exploration and Support Measures I. Overview. -- Many would agree

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth Profile Analysis Intro and Assumptions Psy 524 Andrew Ainsworth Profile Analysis Profile analysis is the repeated measures extension of MANOVA where a set of DVs are commensurate (on the same scale). Profile

More information

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels; 1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test

More information

Research Manual STATISTICAL ANALYSIS SECTION. By: Curtis Lauterbach 3/7/13

Research Manual STATISTICAL ANALYSIS SECTION. By: Curtis Lauterbach 3/7/13 Research Manual STATISTICAL ANALYSIS SECTION By: Curtis Lauterbach 3/7/13 TABLE OF CONTENTS INTRODUCTION 1 STATISTICAL ANALYSIS 1 Overview 1 Dependent Variable 1 Independent Variable 1 Interval 1 Ratio

More information

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency Conflicts of Interest I have no conflict of interest to disclose Biostatistics Kevin M. Sowinski, Pharm.D., FCCP Last-Chance Ambulatory Care Webinar Thursday, September 5, 2013 Learning Objectives For

More information