False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University
|
|
- Rebecca Burns
- 5 years ago
- Views:
Transcription
1 False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University
2 Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,... ) simple questions Small data sets, 21st (Scientific mass production) complicated questions Huge data sets, FDRs and CNV 1
3 Example: Copy Number Variation CNV Gains and losses of chromosome segments (disease association) Instead of 2 copies, might have 0, 1, 3, 4,... Data x ij = noisy msmnt of copy number for subject j at marker position i i = 1, 2,..., N (5000) and j = 1, 2,..., n (150) (< 1% of data!) x ij approx. normal with mean 0 if copy number = 2 FDRs and CNV 2
4 What We Expect To See Hot positions: i where several of subjects j show unusually high (or low) values x ij For some subjects j: intervals of high (or low) values x ij Information on CNV locations in both directions FDRs and CNV 3
5 subject number j > Lowest.001 of the 750,000 entries x[i,j]; Subject 45 shows interval of low values around position 3800; Is position 1755 cnv prone? Interval position number i > FDRs and CNV 4
6 Z-Values X : x ij for i = 1, 2,..., N = 5000 positions, j = 1, 2,..., n = 150 subjects C i = all subjects at ith position (n = 150) Moving averages Replace x ij with x ij = i+5 i5 x lj /11 X (j s msmnts averaged over nearby positions) subtract row median Standardize rows of X divide by row robust standardization Gives Z matrix z ij iterative fdr i estimates FDRs and CNV 5
7 Simultaneous Hypothesis Testing M null hypotheses H 01, H 02,..., H 0M (M = 750, 000 for CNV) Case m has test statistic z m, null density f 0 (z) The problem Given z = (z 1, z 2,..., z M ), simultaneously test all M null hypotheses and don t make many mistakes! FDRs and CNV 6
8 The Bayesian Two-Groups Model Null Mixture Non-null Local false discovery rate fdr(z) = Pr{null z} = π o f 0 (z)/ f (z) Empirical Bayes z ˆπ 0, ˆ f0, ˆ f fdr(z) = ˆπ0 ˆ f0 (z) / ˆ f (z) Reject H 0m if fdr(z m ) small (see Efron, 2008) FDRs and CNV 7
9 Estimated local false discovery rate, all 750,000 zvalues; pihat0=.954, estimated null density N(.04,.93^2) local fdr z value fdrhat(z)=.1 at z=3.30 and 3.57 FDRs and CNV 8
10 zvalues at position i=1755 (solid histogram) compared to all the others (line) << low cnv high cnv > Now for position i= << low cnv high cnv > FDRs and CNV 9
11 A More General Model Classes: C 1, C 2,..., C i,..., C N with n i cases in C i CNV: C i = ith column, n i = n = 150 (the n = 150 subjects measured at position i) fdr i (z) = π i0 f i0 (z)/ f i (z) = Pr{null z, C i } FDRs and CNV 10
12 Combined and Separate Fdr s Strategy: Estimate fdr(z) = π 0 f 0 (z)/ f (z) from combined data and then modify for C i Assume f i0 (z) and f i1 (z) do not depend on i, only π i1 = Pr{non-null C i } varying across classes: fdr i (z) = fdr(z) / [1 + tdr(z)s i ] tdr(z) 1 fdr(z) = true discovery rate and S i = π / i1 πi0 1 π 1 π 0 FDRs and CNV 11
13 Iterative Estimation of fdr i (z) (Model 1) First Estimate fdr(z) = ˆπ 0 ˆ f0 (z) / ˆ f (z) from combined data (z 1, z 2,..., z M ) If k i non-nulls in C i : ˆπ i1 = k i /n gives Ŝ i and fdr i (z) = fdr(z) ( tdri 1 + tdr(z)ŝ = 1 fdr ) i i But ˆki = C i tdri (z m ) estimates k i Iterate! (5 cycles plenty in what follows) FDRs and CNV 12
14 Points where fdrhat <.01. Five iterations of Model 1, z[i,j] from moving averages (i5,i+5) subject marker position FDRs and CNV 13
15 subject j Points where fdrhat.i <.01 (five iterations) z[i,j] from moving averages (i5,i+5) marker position i khat[i] estimates for the 5000 positions khat[i] khat[1755] = marker position i FDRs and CNV 14
16 subject points where fdrk<.01; closeup positions 1700:1800; shows possible CNV region at 1750: marker position khat marker position FDRs and CNV 15
17 Is Position 1755 Significant? ˆk 1755 = 39.1 Believe CNV action at 1755? [ k = 8.13] Permutation test Randomly shift row j of X by s j units left (with wraparound): x j = (x s+1,j, x s+2,j,..., x 5000,j, x 1j, x 2j,..., x sj ) Do this for all 150 rows Recalculate ˆk i values Compare ˆk 1755 = 39.1 with {ˆk, i = 1, 2,..., 5000} i FDRs and CNV 16
18 Actual khat distribution compared to permutation distribution; Maximum khat = 23.3 Frequency actual permutations 39.1 > khat values > FDRs and CNV 17
19 Locally Most Powerful Tests Let r i = π i1 /π 1 = Pr{non-null C i } / Pr{non-null}. l i = n 1 { 1 + (ri 1)T(z ij ) } where T(z) = tdr(z) π 1 π 0 ˆk i nearly MLE in this model Test H 0i : r i = 1 vs r i > 1. Locally most powerful test rejects for large values of Use permutations to get p-values. ˆk (1) i. FDRs and CNV 18
20 Bootstrapping ˆk i Estimates Resample rows (i.e., subjects) Recompute iterative estimate ˆk i (5 iterations model) ŝd i = boot stdev of ˆk, B = 100 resamples i (did not recompute original fdr curve each time) ˆk ) [ N (ˆki, i ŝd2 i 6 ŝd i 7 for ˆk i > 20 ] FDRs and CNV 19
21 Bootstrap estimates of standard deviations for khat[i] values, (5 iterations) plotted vs khat[i]; sdhat[1755]=6.5 bootstrap stdev > khat[i] > FDRs and CNV 20
22 Brown Stein Robbins Estimation Suppose µ g( ) and x µ N(µ, σ 2 ) l(x) log marginal density of x µ x ( x + σ 2 l (x) ), σ 2 ( 1 + σ 2 l (x) ) Apply with µ = k i, x = ˆk i, ˆl(x) = log smoothed density {ˆki } For ˆk i = 39.1, ˆσ = 6.5, gave k 1755 (41.3, ) Conclusion Even taking account of selection effects, k 1755 is probably much larger than k = FDRs and CNV 21
23 More General Model for fdr i (z) Method 2 : Multiclass Bayes model π i0, f i0 (z), f i1 (z) with all f i0 = f 0, but drop assumption that non-null distributions f i1 (z) the same. Define: w i (z) = Pr{C i z} Empirical Bayes of C i indicator on z m. fdr i (z) fdr(z) wi(0) w i (z) Estimate w i (z) by logistic regression FDRs and CNV 22
24 zvalues for positions 1750:1759 (solid) compared to all the other positions (line) Frequency low cnv high cnv z values logistic regression estimate of wi(t)=prob{1750:1759 z} wi(z)/wi(0) z value FDRs and CNV 23
25 Three estimates of fdrhat for positions 1750:1759 fdr estimate Method 1 Method 2 combined << low cn high cn >> z value FDRs and CNV 24
26 References Efron, B. (2008). Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Statist., Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, Walther, G. (2009). Optimal and fast detection of spatial clusters with scan statistics. Online, URL gwalther/. Wang, P., Kim, Y., Pollack, J., Narasimhan, B. FDRs and CNV 25
27 and Tibshirani, R. (2005). A method for calling gains and losses in array CGH data. Biostatistics, Zhang, N., Siegmund, D., Ji, H. and Li, J. (2009). Detecting simultaneous change-points in multiple sequences. Biometrika. Accepted for publication, URL nzhang/. FDRs and CNV 26
Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationDoing Thousands of Hypothesis Tests at the Same Time. Bradley Efron Stanford University
Doing Thousands of Hypothesis Tests at the Same Time Bradley Efron Stanford University 1 Simultaneous Hypothesis Testing 1980: Simultaneous Statistical Inference (Rupert Miller) 2, 3,, 20 simultaneous
More informationPSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5
PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.
More informationBiostatistical modelling in genomics for clinical cancer studies
This work was supported by Entente Cordiale Cancer Research Bursaries Biostatistical modelling in genomics for clinical cancer studies Philippe Broët JE 2492 Faculté de Médecine Paris-Sud In collaboration
More informationComments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al.
Comments on Significance of candidate cancer genes as assessed by the CaMP score by Parmigiani et al. Holger Höfling Gad Getz Robert Tibshirani June 26, 2007 1 Introduction Identifying genes that are involved
More informationReflection Questions for Math 58B
Reflection Questions for Math 58B Johanna Hardin Spring 2017 Chapter 1, Section 1 binomial probabilities 1. What is a p-value? 2. What is the difference between a one- and two-sided hypothesis? 3. What
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics
More informationComparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes
Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University
More informationNormal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.
Review Probability: likelihood of an event Each possible outcome can be assigned a probability If we plotted the probabilities they would follow some type a distribution Modeling the distribution is important
More informationSTATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin
STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin Key words : Bayesian approach, classical approach, confidence interval, estimation, randomization,
More informationAP Statistics. Semester One Review Part 1 Chapters 1-5
AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More information12.1 Inference for Linear Regression. Introduction
12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,
More informationMetabolomic Data Analysis with MetaboAnalyst
Metabolomic Data Analysis with MetaboAnalyst User ID: guest6501 April 16, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety of data types
More informationIntroduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.
Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement
More informationMissing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data
Missing data Patrick Breheny April 3 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/39 Our final topic for the semester is missing data Missing data is very common in practice, and can occur
More informationSPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.
SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods
More informationUnderstanding DNA Copy Number Data
Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationApplied Statistical Analysis EDUC 6050 Week 4
Applied Statistical Analysis EDUC 6050 Week 4 Finding clarity using data Today 1. Hypothesis Testing with Z Scores (continued) 2. Chapters 6 and 7 in Book 2 Review! = $ & '! = $ & ' * ) 1. Which formula
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationComparison of two means
1 Comparison of two means Most studies are comparative in that they compare outcomes from one group with outcomes from another, for example the mean blood pressure in reponse to two different treatments.
More informationBayesian and Frequentist Approaches
Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law
More informationDay Hospital versus Ordinary Hospitalization: factors in treatment discrimination
Working Paper Series, N. 7, July 2004 Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Luca Grassetti Department of Statistical Sciences University of Padua Italy Michela
More informationRisk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach
Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,
More informationThe LiquidAssociation Package
The LiquidAssociation Package Yen-Yi Ho October 30, 2018 1 Introduction The LiquidAssociation package provides analytical methods to study three-way interactions. It incorporates methods to examine a particular
More informationLec 02: Estimation & Hypothesis Testing in Animal Ecology
Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then
More informationCancer outlier differential gene expression detection
Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,
More informationBayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis
Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical
More informationHHS Public Access Author manuscript Mach Learn Med Imaging. Author manuscript; available in PMC 2017 October 01.
Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort Polina Binder 1, Nematollah K. Batmanghelich 2, Raul San Jose Estepar 2, and Polina Golland 1 1 Computer Science and Artificial Intelligence
More informationMLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20
MLE #8 Econ 674 Purdue University Justin L. Tobias (Purdue) MLE #8 1 / 20 We begin our lecture today by illustrating how the Wald, Score and Likelihood ratio tests are implemented within the context of
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector
More informationNEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES
NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationPractical Experience in the Analysis of Gene Expression Data
Workshop Biometrical Analysis of Molecular Markers, Heidelberg, 2001 Practical Experience in the Analysis of Gene Expression Data from Two Data Sets concerning ALL in Children and Patients with Nodules
More informationSISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers
SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington
More informationPerformance Assessment for Radiologists Interpreting Screening Mammography
Performance Assessment for Radiologists Interpreting Screening Mammography Dawn Woodard School of Operations Research and Information Engineering Cornell University Joint work with: Alan Gelfand Department
More informationStatistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies
Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationFundamental Clinical Trial Design
Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationMeasuring noncompliance in insurance benefit regulations with randomized response methods for multiple items
Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University,
More informationUsing mixture priors for robust inference: application in Bayesian dose escalation trials
Using mixture priors for robust inference: application in Bayesian dose escalation trials Astrid Jullion, Beat Neuenschwander, Daniel Lorand BAYES2014, London, 11 June 2014 Agenda Dose escalation in oncology
More informationRussian Journal of Agricultural and Socio-Economic Sciences, 3(15)
ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer
More informationConditional Distributions and the Bivariate Normal Distribution. James H. Steiger
Conditional Distributions and the Bivariate Normal Distribution James H. Steiger Overview In this module, we have several goals: Introduce several technical terms Bivariate frequency distribution Marginal
More informationSample Size Reestimation in Non-Inferiority Trials. Heidelberg, Germany
Sample Size Reestimation in Non-Inferiority Trials Tim Friede 1 and Meinhard Kieser 2 1 Warwick Medical School, The University of Warwick, UK 2 Institute of Medical Biometry and Informatics, University
More informationCommon Statistical Issues in Biomedical Research
Common Statistical Issues in Biomedical Research Howard Cabral, Ph.D., M.P.H. Boston University CTSI Boston University School of Public Health Department of Biostatistics May 15, 2013 1 Overview of Basic
More information* σ = The Z Test. Formulas and Symbols You Should Know. Assignment: Heiman Chapter 10. Terms You Should Know.
Assignment: Heiman Chapter 10 Terms You Should Know. Z-test Critical Value of Z when p
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationA point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation
7.1 Margins of Error and Estimates What is estimation? A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. Population Parameter Sample
More informationBOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS
BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS 17 December 2009 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland Street, Portsmouth
More informationConfidence Intervals On Subsets May Be Misleading
Journal of Modern Applied Statistical Methods Volume 3 Issue 2 Article 2 11-1-2004 Confidence Intervals On Subsets May Be Misleading Juliet Popper Shaffer University of California, Berkeley, shaffer@stat.berkeley.edu
More informationStatistical Tests Using Experimental Data
Statistical Tests Using Experimental Data Alec Brandon July 15, 2015 Alternative title So you ve worked your tail off and have some experimental data. Now what? Why are we even talking about statistics?
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationChapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.
Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able
More informationInvestigating the robustness of the nonparametric Levene test with more than two groups
Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationApplication of Resampling Methods in Microarray Data Analysis
Application of Resampling Methods in Microarray Data Analysis Tests for two independent samples Oliver Hartmann, Helmut Schäfer Institut für Medizinische Biometrie und Epidemiologie Philipps-Universität
More informationBayesian Latent Subgroup Design for Basket Trials
Bayesian Latent Subgroup Design for Basket Trials Yiyi Chu Department of Biostatistics The University of Texas School of Public Health July 30, 2017 Outline Introduction Bayesian latent subgroup (BLAST)
More informationObjectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests
Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing
More informationQuantitative Literacy: Thinking Between the Lines
Quantitative Literacy: Thinking Between the Lines Crauder, Noell, Evans, Johnson Chapter 6: Statistics 2013 W. H. Freeman and Company 1 Chapter 6: Statistics Lesson Plan Data summary and presentation:
More informationConfidence interval and hypothesis testing examples
Confidence interval and hypothesis testing examples Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 11/20/2018 ICU Data Data for a random sample of n = 200 patients admitted to intensive
More informationHuman Cancer Genome Project. Bioinformatics/Genomics of Cancer:
Bioinformatics/Genomics of Cancer: Professor of Computer Science, Mathematics and Cell Biology Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of
More informationAtt vara eller inte vara (en Bayesian)?... Sherlock-conundrum
Att vara eller inte vara (en Bayesian)?... Sherlock-conundrum (Thanks/blame to Google Translate) Gianluca Baio University College London Department of Statistical Science g.baio@ucl.ac.uk http://www.ucl.ac.uk/statistics/research/statistics-health-economics/
More informationEstimating genetic variation within families
Estimating genetic variation within families Peter M. Visscher Queensland Institute of Medical Research Brisbane, Australia peter.visscher@qimr.edu.au 1 Overview Estimation of genetic parameters Variation
More informationSTAT 503X Case Study 1: Restaurant Tipping
STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in
More informationModule Overview. What is a Marker? Part 1 Overview
SISCR Module 7 Part I: Introduction Basic Concepts for Binary Classification Tools and Continuous Biomarkers Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington
More informationBoosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer
Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department
More informationStatistics and Probability
Statistics and a single count or measurement variable. S.ID.1: Represent data with plots on the real number line (dot plots, histograms, and box plots). S.ID.2: Use statistics appropriate to the shape
More informationMultimarker Genetic Analysis Methods for High Throughput Array Data
Multimarker Genetic Analysis Methods for High Throughput Array Data by Iuliana Ionita A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department
More informationMOST: detecting cancer differential gene expression
Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical
More informationResearch and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida
Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality
More informationMMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?
MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference
More informationExpanded View Figures
Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationInstitutional Ranking. VHA Study
Statistical Inference for Ranks of Health Care Facilities in the Presence of Ties and Near Ties Minge Xie Department of Statistics Rutgers, The State University of New Jersey Supported in part by NSF,
More informationUnsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort
Unsupervised Discovery of Emphysema Subtypes in a Large Clinical Cohort Polina Binder 1(B), Nematollah K. Batmanghelich 2, Raul San Jose Estepar 2, and Polina Golland 1 1 Computer Science and Artificial
More informationHarvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh)
Harvard University Harvard University Biostatistics Working Paper Series Year 2005 Paper 30 A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) David
More informationA Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer
A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,
More informationHierarchical Bayesian Modeling of Individual Differences in Texture Discrimination
Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive
More informationReview. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN
Outline 1. Review sensitivity and specificity 2. Define an ROC curve 3. Define AUC 4. Non-parametric tests for whether or not the test is informative 5. Introduce the binormal ROC model 6. Discuss non-parametric
More informationPSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d
PSYCHOLOGY 300B (A01) Assignment 3 January 4, 019 σ M = σ N z = M µ σ M d = M 1 M s p d = µ 1 µ 0 σ M = µ +σ M (z) Independent-samples t test One-sample t test n = δ δ = d n d d = µ 1 µ σ δ = d n n = δ
More informationEPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)
EPS 625 INTERMEDIATE STATISTICS TO-AY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) A researcher conducts a study to evaluate the effects of the length of an exercise program on the flexibility of female and male
More informationChapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)
Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it
More informationEcological Statistics
A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents
More informationComparing Two ROC Curves Independent Groups Design
Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a
More informationPreviously, when making inferences about the population mean,, we were assuming the following simple conditions:
Chapter 17 Inference about a Population Mean Conditions for inference Previously, when making inferences about the population mean,, we were assuming the following simple conditions: (1) Our data (observations)
More informationAuthor summary. Introduction
A Probabilistic Palimpsest Model of Visual Short-term Memory Loic Matthey 1,, Paul M Bays 2,3, Peter Dayan 1 1 Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationA point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. s Sample Standard Deviation
7.1 Margins of Error and Estimates What is estimation? A point estimate is a single value that has been calculated from sample data to estimate the unknown population parameter. Population Parameter Sample
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationComparing multiple proportions
Comparing multiple proportions February 24, 2017 psych10.stanford.edu Announcements / Action Items Practice and assessment problem sets will be posted today, might be after 5 PM Reminder of OH switch today
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationIntroduction & Basics
CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...
More informationWhat do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high
What do you think of the following research? I m interested in whether a low glycemic index diet gives better control of diabetes than a high glycemic index diet. So I randomly assign 100 people with type
More informationPubHlth Introductory Biostatistics Practice Test I (Without Unit 3 Questions)
1 PubHlth 540 - Introductory Biostatistics Practice Test I (Without Unit 3 Questions) 1. (10 points) In the Honolulu Heart Study, Systolic Blood Pressure was tabulated for 100 Subjects including 37 Smokers
More informationAP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE
AP Statistics TOPIC A - Unit 2 MULTIPLE CHOICE Name Date 1) True or False: In a normal distribution, the mean, median and mode all have the same value and the graph of the distribution is symmetric. 2)
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationOriginal Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]
Iranian journal of health sciences 213;1(3):58-7 http://jhs.mazums.ac.ir Original Article Downloaded from jhs.mazums.ac.ir at 22:2 +33 on Friday October 5th 218 [ DOI: 1.18869/acadpub.jhs.1.3.58 ] A New
More informationIdentifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression
RESEARCH ARTICLE Identifying Change Points in a Covariate Effect on Time-to-Event Analysis with Reduced Isotonic Regression Yong Ma 1,2 *, Yinglei Lai 1,3, John M. Lachin 1,2 1. The Biostatistics Center,
More informationEarly Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.
The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.
More information