Checking the Clinical Information for Docetaxel

Similar documents
Matching the Cisplatin Heatmap

Quality assessment of TCGA Agilent gene expression data for ovary cancer

GS Analysis of Microarray Data

The Importance of Reproducibility in High-Throughput Biology: Case Studies in Forensic Bioinformatics

MethylMix An R package for identifying DNA methylation driven genes

GS Analysis of Microarray Data

Hands-On Ten The BRCA1 Gene and Protein

Checking Drug Sensitivity of Cell Lines Used in Signatures

User Guide. Association analysis. Input

Consultation on publication of new cancer waiting times statistics Summary Feedback Report

Computing discriminability and bias with the R software

Stat Wk 9: Hypothesis Tests and Analysis

Data mining with Ensembl Biomart. Stéphanie Le Gras

STAT 201 Chapter 3. Association and Regression

Introduction to antiprofiles

Module 3: Pathway and Drug Development

Table of Contents Foreword 9 Stay Informed 9 Introduction to Visual Steps 10 What You Will Need 10 Basic Knowledge 11 How to Use This Book

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Supplementary Data. Correlation analysis. Importance of normalizing indices before applying SPCA

Impact of Group Rejections on Self Esteem

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

Using Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test...

1. To review research methods and the principles of experimental design that are typically used in an experiment.

The Chancellor, Masters and Scholars of the University of Cambridge, Clara East, The Old Schools, Cambridge CB2 1TN

Lesson: A Ten Minute Course in Epidemiology

arxiv: v1 [stat.ap] 6 Oct 2010

Package citccmst. February 19, 2015

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Supplementary Note. Nature Genetics: doi: /ng.2928

PBSI-EHR Off the Charts!

Binary Diagnostic Tests Two Independent Samples

Understanding Uncertainty in School League Tables*

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Stat 13, Lab 11-12, Correlation and Regression Analysis

Part D PDE Data and the Opioid Epidemic

A SAS Format Catalog for ICD-9/ICD-10 Diagnoses and Procedures: Data Research Example and Custom Reporting Example

CONTRACTING ORGANIZATION: The Chancellor, Masters and Scholars of the University of Cambridge, Clara East, The Old Schools, Cambridge CB2 1TN

White Rose Research Online URL for this paper: Version: Supplemental Material

Lab #7: Confidence Intervals-Hypothesis Testing (2)-T Test

SIMULATED DATA. CMH Argentina. FC Brazil. FA Chile. IHSS/HE Honduras. INCMNSZ Mexico. IMTAvH Peru. Combined

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Package CompareTests

Binary Diagnostic Tests Paired Samples

Presenting Survey Data and Results"

Training Peaks P90X and P90X2 Workout Schedule Instructions

ANCR. PC-NORDCAN version 2.4

Vega: Variational Segmentation for Copy Number Detection

SubLasso:a feature selection and classification R package with a. fixed feature subset

Package AbsFilterGSEA

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

A Quick-Start Guide for rseqdiff

CONTRACTING ORGANIZATION: The Chancellor, Masters and Scholars of the University of Cambridge, Clara East, The Old Schools, Cambridge CB2 1TN

Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The

Package msgbsr. January 17, 2019

GeneOverlap: An R package to test and visualize

Package xseq. R topics documented: September 11, 2015

CHAPTER 8 EXPERIMENTAL DESIGN

Market Research on Caffeinated Products

Package CLL. April 19, 2018

1 Diagnostic Test Evaluation

2016 Children and young people s inpatient and day case survey

TADA: Analyzing De Novo, Transmission and Case-Control Sequencing Data

Section 6: Analysing Relationships Between Variables

IMPaLA tutorial.

SUPPLEMENTARY APPENDICES FOR ONLINE PUBLICATION. Supplement to: All-Cause Mortality Reductions from. Measles Catch-Up Campaigns in Africa

Lab 5: Testing Hypotheses about Patterns of Inheritance

Blue Distinction Centers for Fertility Care 2018 Provider Survey

Day 11: Measures of Association and ANOVA

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

HOLIDAY SURVEY

Answers to end of chapter questions

Introduction to SPSS: Defining Variables and Data Entry

Technical Bulletin. Technical Information for Quidel Molecular Influenza A+B Assay on the Bio-Rad CFX96 Touch

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

CNV PCA Search Tutorial

Batch Upload Instructions

ECDC HIV Modelling Tool User Manual version 1.0.0

One-Way Independent ANOVA

Below, we included the point-to-point response to the comments of both reviewers.

metaseq: Meta-analysis of RNA-seq count data

COAHOMA COUNTY SCHOOL DISTRICT Application for Interim Superintendent of Schools

APPENDIX D STRUCTURE OF RAW DATA FILES

The Tail Rank Test. Kevin R. Coombes. July 20, Performing the Tail Rank Test Which genes are significant?... 3

Bayes Factors for t tests and one way Analysis of Variance; in R

5 Listening to the data

Using the New Nutrition Facts Label Formats in TechWizard Version 5

Introduction to SPSS S0

ClinicalTrials.gov a programmer s perspective

Item Response Theory for Polytomous Items Rachael Smyth

Designed Experiments have developed their own terminology. The individuals in an experiment are often called subjects.

The Savvy Woman's Guide To PCOS (Polycystic Ovarian Syndrome): The Many Faces Of A 21st Century Epidemic...And What You Can Do About It By Elizabeth

Nature Medicine doi: /nm.2830

AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

Author's response to reviews

Reshaping of Human Fertility Database data from long to wide format in Excel

bivariate analysis: The statistical analysis of the relationship between two variables.

Please revise your paper to respond to all of the comments by the reviewers. Their reports are available at the end of this letter, below.

Transcription:

Checking the Clinical Information for Docetaxel Keith A. Baggerly and Kevin R. Coombes November 13, 2007 1 Introduction In their reply to our correspondence, Potti and Nevins note that there is now more supplementary info on their website, http://data.cgt.duke.edu/naturemedicine.php, which should hopefully allow some of our disagreements to be resolved. With respect to docetaxel, one of the first of these new files is Chang clinical data.doc, which gives the clinical information (including the sensitive/resistant status) that they are applying to the samples at hand. This clinical information is relevant because it supplies an explantion for why Potti et al use 13 sensitive and 11 resistant samples when Chang et al report 11 sensitive and 13 resistant. Potti et al define samples to be sensitive if there is less than 0% residual disease, whereas Chang et al use a cutoff of 25% residual disease. The clinical information that Chang et al provide is found in two places: 1. in Table 1 of Chang et al, and 2. in the table of supplementary information for Chang et al available from the Lancet website, http: //image.thelancet.com/extras/01art11086webtable.pdf. The latter table gives the percent residual disease, and expression values for 92 probesets for each of the 2 samples profiled. Here, we simply load the information to check the results. For all 3 of the files (the clinical info doc file from Duke, and the two Chang et al Lancet tables) we have converted the contents to csv files for easier loading. 2 Options and Libraries > options(width = 80) 3 Checking Data We begin with the basic clinical info found in Table 1 of Chang et al. > changtable1 <- read.table(file.path("otherdata", "changtable1.csv"), + sep = ",", header = TRUE) > changtable1[1:3, ] 1

rep02-checkingdoceclinical.rnw 2 Patient Age..years. Menopausal.status Ethnic.origin 1 1 37 Premenopausal Hispanic 2 2 55 Postmenopausal Hispanic 3 3 1 Premenopausal Black Bidimensional.tumour.size..cm. Clinical.axillary.nodes 1 10x10 No 2 10x8 Yes 3 6x5 Yes Oestrogen..receptor.status Progesterone..receptor.status HER.2 Tumour.type 1 - - - IMC 2 - - + IDC 3 + + - IDC > changclinicalfromduke <- read.table(file.path("dukewebsite", + "changclinicalfromduke.csv"), sep = ",", header = TRUE) > changclinicalfromduke[1:3, ] Patient Age..years. Menopausal.status Ethnic.origin 1 1 37 Premenopausal Hispanic 2 2 55 Postmenopausal Hispanic 3 3 1 Premenopausal Black Bidimensional.tumour.size..cm. Clinical.axillary.nodes 1 10x10 No 2 10x8 Yes 3 6x5 Yes Oestrogen..receptor.status Progesterone..receptor.status HER.2 Tumour.type 1 - - - IMC 2 - - + IDC 3 + + - IDC Percent.Residual.Tumor Sens.or.Res..S.is.less.than.0pct. 1 1 S 2 1 S 3 6 S > all(changtable1 == changclinicalfromduke[, 1:10]) [1] TRUE There are 10 columns of data in Chang et al s Table 1. These are the the first 10 columns in the clinical information from Duke (in the same order), and all of the values agree. Next we check the information on (a) Sensitive/Resistant status and (b) percent residual disease, both of which are given in the supplementary table. > changsupp <- read.table(file.path("otherdata", "01art11086webtable_rev.csv"), + sep = ",") > dim(changsupp) [1] 96 29 > changsupp[1:5, ]

rep02-checkingdoceclinical.rnw 3 V1 V2 V3 V 1 Sample number 2 Residual tumour, % 3 Sensitive or resistant Probe set GenBank Locus Link Official Symbol 5 1008_f_at U5068 5610 PRKR V5 V6 1 N1 2 1 3 S Gene name 5 protein kinase, interferon-inducible double stranded RNA dependent 27.5 V7 V8 V9 V10 V11 V12 V13 V1 V15 V16 V17 1 N2 N3 N N5 N6 N7 N8 N9 N10 N11 N12 2 1 6 6 13 1 16 17 18 22 25 36 3 S S S S S S S S S S R 5 19.39 321.16 301.7 765.01 659.16 62. 816.97 385.68 398.58 227.32 355.79 V18 V19 V20 V21 V22 V23 V2 V25 V26 V27 1 N13 N1 N15 N16 N17 N18 N19 N20 N21 N22 2 38 39 5 7 60 6 65 70 100 3 R R R R R R R R R R 5 1392.68 1069.56 1118.6 1580.86 739.08 502.23 772.62 929.03 1892.9 06.26 V28 V29 1 N23 N2 2 100 131 3 R R 5 528.1 638.0 The supplementary table has four header lines of explanatory information, but not all headers are relevant for all columns. The first five columns of the supplementary table give identifying information for the Affymetrix probesets described (Probe Set, GenBank, Locus Link, Official Symbol, Gene Name) and the last 2 columns give the sample specific information. What we want here are the first three rows of header information for the last 2 (sample specific) columns. > temp <- t(changsupp[1:3, 6:29]) > colnames(temp) <- t(as.character(changsupp[1:3, 1])) > temp[c(1, 2, 23, 2), ] V6 "N1 " "1" "S " V7 "N2 " "1" "S " V28 "N23 " "100" "R " V29 "N2 " "131" "R " > temp[, 3] <- substr(temp[, 3], 1, 1) > temp[, 1] <- substr(temp[, 1], 1, nchar(temp[, 1]) - 1) > temp[c(1, 2, 23, 2), ]

rep02-checkingdoceclinical.rnw V6 "N1" "1" "S" V7 "N2" "1" "S" V28 "N23" "100" "R" V29 "N2" "131" "R" > changsuppclinical <- as.data.frame(temp, stringsasfactors = FALSE) > changsuppclinical[, 2] <- as.numeric(changsuppclinical[, 2]) > rownames(changsuppclinical) <- NULL > changsuppclinical[1:3, ] 1 N1 1 S 2 N2 1 S 3 N3 6 S The supplementary table labels the samples N1 through N2, as opposed to 1 through 2 in Table 1, but other than that these look equivalent. > all(changsuppclinical[, "Sample number"] == paste("n", changclinicalfromduke[, + "Patient"], sep = "")) [1] TRUE For the Sensitive/Resistant calls, we expect to see two differences between the tables, corresponding to the two relabeled samples. > which(changsuppclinical[, "Sensitive or resistant "]!= changclinicalfromduke[, + "Sens.or.Res..S.is.less.than.0pct."]) [1] 12 13 We expect the percent residual tumor values to line up across tables. > which(changsuppclinical[, "Residual tumour, % "]!= changclinicalfromduke[, + "Percent.Residual.Tumor"]) [1] 1 > changsuppclinical[11:15, ] 11 N11 25 S 12 N12 36 R 13 N13 38 R 1 N1 39 R 15 N15 R > changclinicalfromduke[11:15, c(1, 11, 12)]

rep02-checkingdoceclinical.rnw 5 Patient Percent.Residual.Tumor Sens.or.Res..S.is.less.than.0pct. 11 11 25 S 12 12 36 S* 13 13 38 S* 1 1 0 R 15 15 R Unexpectedly, there is a discrepancy. The percent residual tumor for patient 1 has changed from 39 percent (Chang) to 0 percent (Potti). This does have a minor effect on classification, as 0 percent was set as the cutoff for the Sensitive/Resistant divide. Using the Duke cutoff with the numbers from Chang et al says there should be 1 Sensitive patients, not 13. Appendix.1 Saves.2 SessionInfo > sessioninfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United Sta attached base packages: [1] "stats" "graphics" "grdevices" "utils" "datasets" "methods" [7] "base" other attached packages: R.matlab R.oo "1.1.3" "1.3.0"