MOST: detecting cancer differential gene expression

 Melinda Barker
 6 months ago
 Views:
Transcription
1 Biostatistics (2008), 9, 3, pp doi: /biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore SUMMARY We propose a new statistics for the detection of differentially expressed genes when the genes are activated only in a subset of the samples. Statistics designed for this unconventional circumstance has proved to be valuable for most cancer studies, where oncogenes are activated for a small number of disease samples. Previous efforts made in this direction include cancer outlier profile analysis (Tomlins and others, 2005), outlier sum (Tibshirani and Hastie, 2007), and outlier robust tstatistics (Wu, 2007). We propose a new statistics called maximum ordered subset tstatistics (MOST) which seems to be natural when the number of activated samples is unknown. We compare MOST to other statistics and find that the proposed method often has more power then its competitors. Keywords: Cancer; COPA; Differential gene expression; Microarray. 1. INTRODUCTION The most popular method for differential gene expression detection in 2sample microarray studies is to compute the tstatistics. The differentially expressed genes are those whose tstatistics exceed a certain threshold. Recently, many researchers have come to the realization that in many cancer studies, many genes show increased expressions in disease samples, but only for a small number of those samples. The study of Tomlins and others (2005) shows that tstatistics has low power in this case, and they introduced the socalled cancer outlier profile analysis (COPA). Their study shows clearly that COPA can perform better than the traditional tstatistics for cancer microarray data sets. More recently, several progresses have been made in this direction with the aim to design better statistics to account for the heterogeneous activation pattern of the cancer genes. In Tibshirani and Hastie (2007), the authors introduced a new statistics, which they called outlier sum. Later, Wu (2007) proposed outlier robust tstatistics (ORT) and showed it usually outperformed the previously proposed ones in both simulation study and application to real data set. In this paper, we propose another statistics for the detection of cancer differential gene expression which have similar power to ORT when the number of activated samples is very small, but perform better when more samples are differentially expressed. We call our new method the maximum ordered subset tstatistics (MOST). Through simulation studies, we found the new statistics outperformed the previously To whom correspondence should be addressed. c The Author Published by Oxford University Press. All rights reserved. For permissions, please
2 412 H. LIAN proposed ones under some circumstances and never significantly worse in all situations. Thus, we think it is a valuable addition to the dictionary of cancer outlier expression detection. 2. MAXIMUM ORDERED SUBSET tstatistics We consider the simple 2class microarray data for detecting cancer genes. We assume there are n normal samples and m cancer samples. The gene expressions for normal samples are denoted by x ij for genes i = 1, 2,...,p and samples j = 1, 2,...n, while y ij denote the expressions for cancer samples with i = 1, 2,...,p and j = 1, 2,...m. In this paper, we are only interested in 1sided test where the activated genes from cancer samples have a higher expression level. The extension to 2sided test is straightforward. The usual tstatistics (up to a multiplication factor independent of genes) for 2sample test of differences in means is defined for each gene i by T i = x i ȳ i, (2.1) s i where x i = j x ij/n is the average expression of gene i in normal samples, ȳ i = j y ij/m is the average expression of gene i in cancer samples, and s i is the usual pooled standard deviation estimate 1 si 2 j n = (x ij x i ) j m (y ij ȳ i ) 2. n + m 2 The tstatistics is powerful when the alternative distribution is such that y ij, j = 1, 2,...,m, all come from a distribution with a higher mean. Tomlins and others (2005) argues that for most cancer types, heterogeneous activation patterns make tstatistics inefficient for detecting those expression profiles. They defined the COPA statistics C i = q r ({y ij } 1 j m ) med i, (2.2) mad i where q r ( ) is the rth percentile of the data, med i = median({x ij } 1 j n, {y ij } 1 j m ) is the median of the pooled samples for gene i, and mad i = median({x ij med i } 1 j n, {y ij med i } 1 j m ) is the median absolute deviation of the pooled samples. The choice of r in (2.2) depends on the subjective judgement of the user. The use of med i and mad i to replace the mean and the standard deviation in (2.1) is due to robustness considerations since it is already known that some of the genes are differentially expressed. In (2.2), only one value of {y ij } is used in the computation. A more efficient strategy would be to use additional expression values. Let O i ={y ij : y ij > q 75 ({x ij } 1 j n, {y ij } 1 j m ) + IQR({x ij } 1 j n, {y ij } 1 j m )} (2.3) be the outliers from the cancer samples for gene i, where IQR( ) is the interquartile range of the data. The OS statistics from Tibshirani and Hastie (2007) is then defined as y ij O i (y ij med i ) OS i =. (2.4) mad i More recently, Wu (2007) studied ORT statistics, which is similar to OS statistics. The important difference that makes ORT superior is that outliers are defined relative to the normal sample instead of the pooled sample. So in their definition, O i ={y ij : y ij > q 75 ({x ij } 1 j n ) + IQR({x ij } 1 j n )}. (2.5)
3 MOST: detecting cancer differential gene expression 413 By similar reasoning, med i in OS is replaced by med ix and mad j by median({x ij med ix } 1 j n, {y ij med iy } 1 j m ), where med ix and med iy are the medians of normal and cancer samples, respectively. In both OS and ORT statistics, the outliers are defined somewhat arbitrarily with no convincing reasons. To address this question, we propose the following statistics that implicitly considers all possible values for outlier thresholds. Suppose for notational simplicity that {y ij } 1 j m are ordered for each i: y i1 y i2 y im. If the number of samples where oncogenes are activated is known, we would naturally define the statistics as 1 j k M ik = (y ij med ix ) median({x ij med ix } 1 j n, {y ij med iy } 1 j m ). (2.6) When k is not known to us, one would be tempted to define M i = max M ik. 1 k m But this does not quite work since obviously M ik for different values of k are not directly comparable under the null distribution that x ij, y ij N(0, 1). For example, when m = 2, we have E[y i1 med ix ] > 0, while E [ j=1,2 (y ij med ix ) ] = 0. This observation motivates us to normalize M ik such that each approximately has mean 0 and variance 1. This can be achieved by defining µ k = E [ 1 j k z j] and σk 2 = Var( 1 j k z j), where z1 > z 2 > > z m is the order statistics of m samples generated from the standard normal distribution. Then, we can define M ik as ( 1 j k M ik = (y )/ ij med ix ) median({x ij med ix } 1 j n, {y ij med iy } 1 j m ) µ k σ k (2.7) so that M ik has mean and variance approximately equal to 0 and 1, respectively. Finally, we can define our new statistics (called MOST) as M i = max M ik. (2.8) 1 k m With MOST, we practically consider every possible threshold above which y ij are taken to be outliers. In this formulation, the number of outliers is implicitly defined as arg max M ik. (2.9) 1 k m 3. SIMULATION STUDIES AND APPLICATION Some simulations are carried out to study MOST and compare its performance to OS, ORT, COPA, and tstatistics. For COPA, we choose to use the 90th percentile in its definition as in Tibshirani and Hastie (2007). We generate the expression data from standard normal with n = m = 20. For various values k, 1 k m, which is the number of differentially expressed cancer samples, a constant µ is added for differentially expressed genes. We simulated 1000 differentially and nondifferentially expressed genes and calculated the receiver operating characteristic (ROC) curves from them by choosing different thresholds for gene calls. Figures 1 and 2 plot the ROC curves for some combinations of k and µ. Forµ = 2 and k small, all 5 statistics behave similarly with tstatistics performing the worst. As k increases, t becomes better and OS and COPA begin to lose power. For µ = 1 and medium to large k, the performance of MOST is only worse than t and better than other statistics. Smaller k in this case basically leads to ROC curve that is close to a 45 line for all statistics since the signal µ = 1 is too weak in this case, so we do not show these
4 414 H. LIAN Fig. 1. ROC curves estimated based on simulation. The number of normal/cancer sample is n = m = 20. Various combinations of µ and k s are chosen. Other uninteresting results where all statistics have close to perfectly good or bad performances are excluded as explained in the main text.
5 MOST: detecting cancer differential gene expression 415 Fig. 2. More ROC curves. results. For µ = 4 and small k, MOST is better than ORT, COPA, and t, and in this situation, only OS is competitive with MOST. Larger k in this case will produce nearly perfect ROC curves for all statistics, and thus those results are also omitted. Besides ROC curves, we have also tried examining the possibility of using (2.9) for estimating the number of differentially expressed samples k but so far have been unable to get a reasonable estimate out of it. From the above simulations, we judge that our new estimate MOST is at least as good as other previously proposed statistics, sometimes much better. Thus, it is a valuable tool for detecting activated genes in many situations. As an example of real data application, the data from West and others (2001) is publicly available from The microarray used in the breast cancer study contains 7129 genes and 49 tumor samples, 25 of which with no positive lymph nodes identified and the other 24 with positive nodes. Similar to Wu (2007), we take the log transformation of the expressions after normalizing the data. We apply all the above mentioned statistics to the data. For real data application, we need to use 2sided test. The modification for MOST required is straightforward. We just use 1 j k m ik = (y ij med ix ) median({x ij med ix } 1 j n, {y ij med iy } 1 j m ). (3.1)
6 416 H. LIAN This time with y ij ordered such that y i1 y i2 y im. We also need to normalize m ik as in (2.7) and then m i = min 1 k m m ik. This test will detect downwardregulated genes in disease samples. The 2sided test will be the maximum of the absolute values of M i and m i, but the sign is kept. A search of PubMed returns 908 genes related to breast cancer, and these are mapped to 655 probe sets on the Affymetrix HuGeneFL microarray used in this experiment. The ranking of these are computed for all the 5 statistics. The differences in ranking between MOST and other statistics are plotted in Figure 3 as histograms. Negative values in the histograms show the superiority of the MOST statistics since they imply higher ranking to these breast cancer related genes given by MOST. The histograms show that for this data, the performance of MOST is superior to tstatistics (shown by the heavy left tail in the histogram) and similar to others. In a second example, we consider a subset of acute lymphoblastic leukemia (ALL) data consisting 79 samples patients with Bcell acute lymphoblastic leukemia, available from the Bioconductor project. The comparison is between 37 samples with the BCR/ABL fusion gene resulting from a translocation and 42 normal samples. In total, 358 probe sets on the array are annotated with the gene ontology term tyrosine Fig. 3. Comparison between MOST and other statistics on breast cancer data. The histogram shows the difference in ranking on 655 probe sets. In these figures, leftskewed histogram implies superiority of MOST.
7 MOST: detecting cancer differential gene expression 417 Fig. 4. Comparison between MOST and other statistics on ALL data. kinase activity, which is believed to mediate many of the effects due to BCR/ABL translocation. After a simple filtering to remove probe sets that are not expressed, 94 of those 358 probe sets remain. The differences in ranking between MOST and other statistics are shown as histograms in Figure 4. For this data, ttest and MOST seem to be superior to other methods. The R implementation of MOST statistics as well as sample code for simulation is available at REFERENCES TIBSHIRANI, R. AND HASTIE, T. (2007). Outlier sums for differential gene expression analysis. Biostatistics 8, 2 8. TOMLINS, S. A., RHODES, D. R., PERNER, S., DHANASEKARAN, S. M., MEHRA, R., SUN, X. W., VARAMBALLY, S., CAO, X., TCHINDA, J., KUEFER, R. and others (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, WEST, M.,BLANCHETTE, C., DRESSMAN, H.,HUANG, E.,ISHIDA, S., SPANG, R., ZUZAN, H.,OLSON, J.A.J., MARKS, J. R. AND NEVINS, J. R. (2001). Predicting the clinical status of human breast cancer by using
8 418 H. LIAN gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98, WU, B. (2007). Cancer outlier differential gene expression detection. Biostatistics 8, [Received August 8, 2007; first revision September 24, 2007; second revision October 23, 2007; accepted for publication October 25, 2007]
Cancer outlier differential gene expression detection
Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,
More informationBayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions
Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics
More informationVU Biostatistics and Experimental Design PLA.216
VU Biostatistics and Experimental Design PLA.216 Julia Feichtinger Postdoctoral Researcher Institute of Computational Biotechnology Graz University of Technology Outline for Today About this course Background
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! twoway table! marginal distributions! conditional distributions!
More informationAP Statistics. Semester One Review Part 1 Chapters 15
AP Statistics Semester One Review Part 1 Chapters 15 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationDiagnosis of multiple cancer types by shrunken centroids of gene expression
Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:65676572, 14 May 2002 Nearest Centroid
More informationChapter 1: Introduction to Statistics
Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship
More informationCase Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD
Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review
More informationSawtooth Software. MaxDiff Analysis: Simple Counting, IndividualLevel Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, IndividualLevel Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,
More informationPRACTICAL STATISTICS FOR MEDICAL RESEARCH
PRACTICAL STATISTICS FOR MEDICAL RESEARCH Douglas G. Altman Head of Medical Statistical Laboratory Imperial Cancer Research Fund London CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C. Contents
More informationTitle: A robustness study of parametric and nonparametric tests in ModelBased Multifactor Dimensionality Reduction for epistasis detection
Author's response to reviews Title: A robustness study of parametric and nonparametric tests in ModelBased Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John
More informationClassification of cancer profiles. ABDBM Ron Shamir
Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;
More informationDetermining the Vulnerabilities of the Power Transmission System
Determining the Vulnerabilities of the Power Transmission System B. A. Carreras D. E. Newman I. Dobson Depart. Fisica Universidad Carlos III Madrid, Spain Physics Dept. University of Alaska Fairbanks AK
More informationInvestigating the robustness of the nonparametric Levene test with more than two groups
Psicológica (2014), 35, 361383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing
More information2016 Children and young people s inpatient and day case survey
NHS Patient Survey Programme 2016 Children and young people s inpatient and day case survey Technical details for analysing trustlevel results Published November 2017 CQC publication Contents 1. Introduction...
More informationOn the Combination of Collaborative and Itembased Filtering
On the Combination of Collaborative and Itembased Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory
More informationLouis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison
Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Munsell Color Science Laboratory, Chester F. Carlson Center for Imaging Science Rochester Institute
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationAnalysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen
More informationOn Regression Analysis Using Bivariate Extreme Ranked Set Sampling
On Regression Analysis Using Bivariate Extreme Ranked Set Sampling Atsu S. S. Dorvlo atsu@squ.edu.om Walid AbuDayyeh walidan@squ.edu.om Obaid Alsaidy obaidalsaidy@gmail.com Abstract Many forms of ranked
More informationbreast cancer; relative risk; risk factor; standard deviation; strength of association
American Journal of Epidemiology The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please email:
More informationA Comparison of Methods for Determining HIV Viral Set Point
STATISTICS IN MEDICINE Statist. Med. 2006; 00:1 6 [Version: 2002/09/18 v1.11] A Comparison of Methods for Determining HIV Viral Set Point Y. Mei 1, L. Wang 2, S. E. Holte 2 1 School of Industrial and Systems
More informationMulticlass cancer outlier differential gene expression detection Supplementary Information
Multiclass cancer outlier differential gene expression detection Supplementary Information Fang Liu, Baolin Wu 1 Breast cancer microarray data analysis The breast cancer microarray data (West et al.,
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationIntroduction & Basics
CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...
More informationComparison of the Null Distributions of
Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic Domenic V. Cicchetti West Haven VA Hospital and Yale University Joseph L. Fleiss Columbia University It frequently occurs
More informationContent. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes
Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General
More informationRELYING ON TRUST TO FIND RELIABLE INFORMATION. Keywords Reputation; recommendation; trust; information retrieval; open distributed systems.
Abstract RELYING ON TRUST TO FIND RELIABLE INFORMATION Alfarez AbdulRahman and Stephen Hailes Department of Computer Science, University College London Gower Street, London WC1E 6BT, United Kingdom Email:
More informationChapter 4 Cellular Oncogenes ~ 4.6 
Chapter 4 Cellular Oncogenes  4.2 ~ 4.6  Many retroviruses carrying oncogenes have been found in chickens and mice However, attempts undertaken during the 1970s to isolate viruses from most types of
More informationAttentional Theory Is a Viable Explanation of the Inverse Base Rate Effect: A Reply to Winman, Wennerholm, and Juslin (2003)
Journal of Experimental Psychology: Learning, Memory, and Cognition 2003, Vol. 29, No. 6, 1396 1400 Copyright 2003 by the American Psychological Association, Inc. 02787393/03/$12.00 DOI: 10.1037/02787393.29.6.1396
More informationStudents were asked to report how far (in miles) they each live from school. The following distances were recorded. 1 Zane Jackson 0.
Identifying Outliers Task Students were asked to report how far (in miles) they each live from school. The following distances were recorded. Student Distance 1 Zane 0.4 2 Jackson 0.5 3 Benjamin 1.0 4
More informationResponsiveness to feedback as a personal trait
Responsiveness to feedback as a personal trait Thomas Buser University of Amsterdam Leonie Gerhards University of Hamburg Joël van der Weele University of Amsterdam Pittsburgh, June 12, 2017 1 / 30 Feedback
More informationSection on Survey Research Methods JSM 2009
Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey
More informationModel Selection Methods for Cancer Staging and Other Disease Stratification Problems. Yunzhi Lin
Model Selection Methods for Cancer Staging and Other Disease Stratification Problems by Yunzhi Lin A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY
More informationMethodological skills
Methodological skills rma linguistics, week 3 Tamás Biró ACLC University of Amsterdam t.s.biro@uva.nl Tamás Biró, UvA 1 Topics today Parameter of the population. Statistic of the sample. Re: descriptive
More informationTITLE: Small Molecule Inhibitors of ERG and ETV1 in Prostate Cancer
AD Award Number: W81XWH1210399 TITLE: Small Molecule Inhibitors of ERG and ETV1 in Prostate Cancer PRINCIPAL INVESTIGATOR: Colm Morrissey CONTRACTING ORGANIZATION: University of Washington Seattle WA
More informationBayesian Prediction Tree Models
Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for ClinicoGenomics Clinical gene expression data  expression signatures, profiling Tree models for predictive subtyping Combining
More informationAssessing the diagnostic accuracy of a sequence of tests
Biostatistics (2003), 4, 3,pp. 341 351 Printed in Great Britain Assessing the diagnostic accuracy of a sequence of tests MARY LOU THOMPSON Department of Biostatistics, Box 357232, University of Washington,
More informationAn Empirical and Formal Analysis of Decision Trees for Ranking
An Empirical and Formal Analysis of Decision Trees for Ranking Eyke Hüllermeier Department of Mathematics and Computer Science Marburg University 35032 Marburg, Germany eyke@mathematik.unimarburg.de Stijn
More informationA Comparison of Methods of Estimating Subscale Scores for MixedFormat Tests
A Comparison of Methods of Estimating Subscale Scores for MixedFormat Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational
More informationDetection Theory: Sensitivity and Response Bias
Detection Theory: Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System
More informationUNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2
UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2 DATE: 3 May 2017 MARKS: 75 ASSESSOR: Prof PJ Blignaut MODERATOR: Prof C de Villiers (UP) TIME: 2 hours
More informationSTT315 Chapter 2: Methods for Describing Sets of Data  Part 2
Chapter 2.5 Interpreting Standard Deviation Chebyshev Theorem Empirical Rule Chebyshev Theorem says that for ANY shape of data distribution at least 3/4 of all data fall no farther from the mean than 2
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationThe reliability and validity of the Index of Complexity, Outcome and Need for determining treatment need in Dutch orthodontic practice
European Journal of Orthodontics 28 (2006) 58 64 doi:10.1093/ejo/cji085 Advance Access publication 8 November 2005 The Author 2005. Published by Oxford University Press on behalf of the European Orthodontics
More informationSawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES
Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,
More informationEvidence Based Medicine
Course Goals Goals 1. Understand basic concepts of evidence based medicine (EBM) and how EBM facilitates optimal patient care. 2. Develop a basic understanding of how clinical research studies are designed
More information1) What is the independent variable? What is our Dependent Variable?
1) What is the independent variable? What is our Dependent Variable? Independent Variable: Whether the font color and word name are the same or different. (Congruency) Dependent Variable: The amount of
More informationThe Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models
University of Nebraska  Lincoln DigitalCommons@University of Nebraska  Lincoln Educational Psychology Papers and Publications Educational Psychology, Department of 712001 The Relative Performance of
More informationStatistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN
Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,
More informationFixedEffect Versus RandomEffects Models
PART 3 FixedEffect Versus RandomEffects Models Introduction to MetaAnalysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein 2009 John Wiley & Sons, Ltd. ISBN: 9780470057247
More informationBuilding MultiMarker Algorithms for Disease Prediction The Role of Correlations Among Markers
Biomarker Insights Original Research Open Access Full open access to this and thousands of other papers at http://www.lapress.com. Building MultiMarker Algorithms for Disease Prediction The Role of Correlations
More informationUnderstanding DNA Copy Number Data
Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php
More informationMotivation: Fraud Detection
Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoaop.gif Jian Pei: CMPT 741/459 Data Mining  Outlier Detection (1) 2 Techniques: Fraud Detection Features Dissimilarity Groups and
More informationIndividualized Treatment Effects Using a Nonparametric Bayesian Approach
Individualized Treatment Effects Using a Nonparametric Bayesian Approach Ravi Varadhan Nicholas C. Henderson Division of Biostatistics & Bioinformatics Department of Oncology Johns Hopkins University
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationLab 2: Nonlinear regression models
Lab 2: Nonlinear regression models The flint water crisis has received much attention in the past few months. In today s lab, we will apply the nonlinear regression to analyze the lead testing results
More informationMODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION
STATISTICA, anno LXIII, n. 2, 2003 MODEL PERFORMANCE ANALYSIS AND MODEL VALIDATION IN LOGISTIC REGRESSION 1. INTRODUCTION Regression models are powerful tools frequently used to predict a dependent variable
More informationManifestation Of Differences In ItemLevel Characteristics In ScaleLevel Measurement Invariance Tests Of MultiGroup Confirmatory Factor Analyses
Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275282 1538 9472/05/$95.00 Manifestation Of Differences In ItemLevel Characteristics In ScaleLevel Measurement
More informationBusiness Statistics (ECOE 1302) Spring Semester 2011 Chapter 3  Numerical Descriptive Measures Solutions
The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences Business Statistics (ECOE 1302) Spring Semester 2011 Chapter 3  Numerical Descriptive Measures Solutions
More informationChapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy
Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy JeeSeon Kim and Peter M. Steiner Abstract Despite their appeal, randomized experiments cannot
More informationProposing a New Term Weighting Scheme for Text Categorization
Proposing a New Term Weighting Scheme for Text Categorization Man LAN Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613 lanman@i2r.astar.edu.sg ChewLim TAN School of Computing
More informationPredicting clinical outcomes in neuroblastoma with genomic data integration
Predicting clinical outcomes in neuroblastoma with genomic data integration Ilyes Baali, 1 Alp Emre Acar 1, Tunde Aderinwale 2, Saber HafezQorani 3, Hilal Kazan 4 1 Department of ElectricElectronics Engineering,
More informationStatistical Considerations for Post Market Surveillance
Statistical Considerations for Post Market Surveillance 30 th International Conference on Phamacoepidemiology and Therapeutic Risk Management Ted Lystig, Ph.D. MEDTRONIC INC. Distinguished Statistician
More informationA Real Case Comparison of Average and Population Bioequivalence for Evaluation of APSD data
A Real Case Comparison of Average and Population Bioequivalence for Evaluation of APSD data IPACRS/UF Orlando Inhalation Conference March 1820, 2014 Dennis Sandell, S5 Consulting Contents The data brief
More informationHuman population substructure and genetic association studies
Human population substructure and genetic association studies Stephanie A. Santorico, Ph.D. Department of Mathematical & Statistical Sciences Stephanie.Santorico@ucdenver.edu Global Similarity Map from
More informationOnline Supplement to A DataDriven Model of an AppointmentGenerated Arrival Process at an Outpatient Clinic
Online Supplement to A DataDriven Model of an AppointmentGenerated Arrival Process at an Outpatient Clinic SongHee Kim, Ward Whitt and Won Chul Cha USC Marshall School of Business, Los Angeles, CA 989,
More informationIntroduction to Metaanalysis of Accuracy Data
Introduction to Metaanalysis of Accuracy Data Hans Reitsma MD, PhD Dept. of Clinical Epidemiology, Biostatistics & Bioinformatics Academic Medical Center  Amsterdam Continental European Support Unit
More informationA Comparison of Several GoodnessofFit Statistics
A Comparison of Several GoodnessofFit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessoffit procedures
More informationThinking about Inference
CHAPTER 15 c Jupiterimages/Age fotostock Thinking about Inference IN THIS CHAPTER WE COVER... To this point, we have met just two procedures for statistical inference. Both concern inference about the
More informationStatistical Techniques. Masoud Mansoury and Anas Abulfaraj
Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing
More informationABSTRACT THE INDEPENDENT MEANS TTEST AND ALTERNATIVES SESUG Paper PO10
SESUG 01 Paper PO10 PROC TTEST (Old Friend), What Are You Trying to Tell Us? Diep Nguyen, University of South Florida, Tampa, FL Patricia Rodríguez de Gil, University of South Florida, Tampa, FL Eun Sook
More informationLinking Errors in Trend Estimation in LargeScale Surveys: A Case Study
Research Report Linking Errors in Trend Estimation in LargeScale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR1010 Listening. Learning. Leading. Linking Errors in Trend Estimation
More informationROC Curve. Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341)
ROC Curve Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) 580342 ROC Curve The ROC Curve procedure provides a useful way to evaluate the performance of classification
More informationBreast screening: understanding case difficulty and the nature of errors
Loughborough University Institutional Repository Breast screening: understanding case difficulty and the nature of errors This item was submitted to Loughborough University's Institutional Repository by
More informationInternal Consistency: Do We Really Know What It Is and How to. Assess it?
Internal Consistency: Do We Really Know What It Is and How to Assess it? Wei Tang, Ying Cui, CRAME, Department of Educational Psychology University of Alberta, CANADA The term internal consistency has
More information3. Fixedsample Clinical Trial Design
3. Fixedsample Clinical Trial Design 3. Fixedsample clinical trial design 3.1 On choosing the sample size 3.2 Frequentist evaluation of a fixedsample trial 3.3 Bayesian evaluation of a fixedsample
More informationMultiple Comparisons and the Known or Potential Error Rate
Journal of Forensic Economics 19(2), 2006, pp. 231236 2007 by the National Association of Forensic Economics Multiple Comparisons and the Known or Potential Error Rate David Tabak * I. Introduction Many
More informationA New Bliss Independence Model to Analyze Drug Combination Data
51867JBXXXX10.1177/108705711451867Journal of Biomolecular ScreeningZhao et al. researcharticle014 Technical Note A New Bliss Independence Model to Analyze Drug Combination Data Journal of Biomolecular
More informationReliability, validity, and all that jazz
Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to
More informationOn perfect workingmemory performance with large numbers of items
Psychon Bull Rev (2011) 18:958 963 DOI 10.3758/s1342301101087 BRIEF REPORT On perfect workingmemory performance with large numbers of items Jonathan E. Thiele & Michael S. Pratte & Jeffrey N. Rouder
More informationStatistics Guide. Prepared by: Amanda J. Rockinson Szapkiw, Ed.D.
This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical
More informationTHE IMPORTANCE OF THE NORMALITY ASSUMPTION IN LARGE PUBLIC HEALTH DATA SETS
Annu. Rev. Public Health 2002. 23:151 69 DOI: 10.1146/annurev.publheath.23.100901.140546 Copyright c 2002 by Annual Reviews. All rights reserved THE IMPORTANCE OF THE NORMALITY ASSUMPTION IN LARGE PUBLIC
More informationRisk Aversion in Games of Chance
Risk Aversion in Games of Chance Imagine the following scenario: Someone asks you to play a game and you are given $5,000 to begin. A ball is drawn from a bin containing 39 balls each numbered 139 and
More informationTESTING at any level (e.g., production, field, or onboard)
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 54, NO. 3, JUNE 2005 1003 A Bayesian Approach to Diagnosis and Prognosis Using BuiltIn Test John W. Sheppard, Senior Member, IEEE, and Mark A.
More informationIntroduction to Quantitative Methods (SR8511) Project Report
Introduction to Quantitative Methods (SR8511) Project Report Exploring the variables related to and possibly affecting the consumption of alcohol by adults Student Registration number: 554561 Word counts
More informationChapter 11. Experimental Design: OneWay Independent Samples Design
111 Chapter 11. Experimental Design: OneWay Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing
More informationHere are the various choices. All of them are found in the Analyze menu in SPSS, under the submenu for Descriptive Statistics :
Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different
More informationKnowledge discovery tools 381
Knowledge discovery tools 381 hours, and prime time is prime time precisely because more people tend to watch television at that time.. Compare histograms from di erent periods of time. Changes in histogram
More informationLearning to Cook: An Exploration of Recipe Data
Learning to Cook: An Exploration of Recipe Data Travis Arffa (tarffa), Rachel Lim (rachelim), Jake Rachleff (jakerach) Abstract Using recipe data scraped from the internet, this project successfully implemented
More informationJSM Survey Research Methods Section
Methods and Issues in Trimming Extreme Weights in Sample Surveys Frank Potter and Yuhong Zheng Mathematica Policy Research, P.O. Box 393, Princeton, NJ 08543 Abstract In survey sampling practice, unequal
More informationHow many attributes should be used in a paired comparison task?
How many attributes should be used in a paired comparison task? An empirical examination using a new validation approach Heinz Holling + Torsten Melles Wolfram Reiners February 1998 + Heinz Holling is
More informationGCE. Statistics (MEI) OCR Report to Centres. June Advanced Subsidiary GCE AS H132. Oxford Cambridge and RSA Examinations
GCE Statistics (MEI) Advanced Subsidiary GCE AS H132 OCR Report to Centres June 2013 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge and RSA) is a leading UK awarding body, providing a wide
More informationReview+Practice. May 30, 2012
Review+Practice May 30, 2012 Final: Tuesday June 5 8:3010:20 Venue: Sections AA and AB (EEB 125), sections AC and AD (EEB 105), sections AE and AF (SIG 134) Format: Short answer. Bring: calculator, BRAINS
More informationDiagnostic imaging evaluating image quality using visual grading characteristic (VGC) analysis
Vet Res Commun (2010) 34:473 479 DOI 10.1007/s1125901094132 SHORT COMMUNICATION Diagnostic imaging evaluating image quality using visual grading characteristic (VGC) analysis Eberhard Ludewig & Andreas
More informationIndividual and triallevel surrogacy in colorectal cancer
Stat Methods Med Res OnlineFirst, published on February 19, 2008 as doi:10.1177/0962280207081864 SMM081864 2008/2/5 page 1 Statistical Methods in Medical Research 2008; 1 9 Individual and triallevel
More informationTesting the representation of time in reference memory in the bisection and the generalization task: The utility of a developmental approach
PQJE178995 TECHSET COMPOSITION LTD, SALISBURY, U.K. 6/16/2006 THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY 0000, 00 (0), 1 17 Testing the representation of time in reference memory in the bisection
More informationEPIDEMIOLOGY. Training module
1. Scope of Epidemiology Definitions Clinical epidemiology Epidemiology research methods Difficulties in studying epidemiology of Pain 2. Measures used in Epidemiology Disease frequency Disease risk Disease
More informationEffective Implementation of Bayesian Adaptive Randomization in Early Phase Clinical Development. Pantelis Vlachos.
Effective Implementation of Bayesian Adaptive Randomization in Early Phase Clinical Development Pantelis Vlachos Cytel Inc, Geneva Acknowledgement Joint work with Giacomo Mordenti, Grünenthal Virginie
More information