Gene expression correlates of clinical prostate cancer behavior

Similar documents
Predictive Biomarkers

Supplemental Figure 1

S1.04 Principal clinician. G1.01 Comments. G2.01 *Specimen dimensions (prostate) S2.02 *Seminal vesicles

Classification of cancer profiles. ABDBM Ron Shamir

3/28/2017. Disclosure of Relevant Financial Relationships. GU Evening Subspecialty Case Conference. Differential Diagnosis:

T. R. Golub, D. K. Slonim & Others 1999

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE

Introduction to Discrimination in Microarray Data Analysis

Supplemental Information

Prostate Pathology: Prostate Carcinoma, variants and Gleason Grading (Part 1)

goprofiles: an R package for the Statistical Analysis of Functional Profiles

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Grading Prostate Cancer: Recent Changes and Refinements

ACCME/Disclosures. Cribriform Lesions of the Prostate. Case

Synonyms. Nephrogenic metaplasia Mesonephric adenoma

INTRADUCTAL LESIONS OF THE PROSTATE. Jonathan I. Epstein

Patient characteristics of training and validation set. Patient selection and inclusion overview can be found in Supp Data 9. Training set (103)

Prostate Cancer Grading, Staging and Reporting: An Update Cristina Magi-Galluzzi, MD, PhD

ARTHUR PURDY STOUT SOCIETY COMPANION MEETING: DIFFICULT NEW DIFFERENTIAL DIAGNOSES IN PROSTATE PATHOLOGY. Jonathan I. Epstein.

Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification

PSA. HMCK, p63, Racemase. HMCK, p63, Racemase

Intraductal carcinoma of the prostate on needle biopsy: histologic features and clinical significance

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Prostate cancer ~ diagnosis and impact of pathology on prognosis ESMO 2017

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

NETWORK CYCLE FEATURES: APPLICATION TO COMPUTER-AIDED GLEASON GRADING OF PROSTATE CANCER HISTOPATHOLOGICAL IMAGES

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Outline. Outline. Phillip G. Febbo, MD. Genomic Approaches to Outcome Prediction in Prostate Cancer

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis

Cell Orientation Entropy (COrE): Predicting Biochemical Recurrence from Prostate Cancer Tissue Microarrays

Some prostatic diseases

Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations

Colon cancer subtypes from gene expression data

Gleason Scoring System 2017 JASREMAN DHILLON, MD ASSOCIATE PROFESSOR, DEPARTMENT OF ANATOMIC PATHOLOGY, MOFFITT CANCER CENTER, TAMPA, FLORIDA

Gross appearance of nodular hyperplasia in material obtained from suprapubic prostatectomy. Note the multinodular appearance and the admixture of

3/23/2017. Significant Changes in Prostate Cancer Classification, Grading, Staging and Reporting. Disclosure of Relevant Financial Relationships

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Prostate cancer grading: a decade after the 2005 modified system

Comparison of discrimination methods for the classification of tumors using gene expression data

THE FUTURE OF OR. Dimitris Bertsimas MIT

SUPPLEMENTARY INFORMATION

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gleason grading and prognostic factors in carcinoma of the prostate

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system

Dinesh Singh, M.D. Assistant Professor of Surgery Director of Laparoscopy and Endourology

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Data complexity measures for analyzing the effect of SMOTE over microarrays

Knowledge Discovery and Data Mining I

Outlier Analysis. Lijun Zhang

Applying Machine Learning Methods in Medical Research Studies

Extraction of Informative Genes from Microarray Data

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Recognizing Scenes by Simulating Implied Social Interaction Networks

VL Network Analysis ( ) SS2016 Week 3

Reliability of Ordination Analyses

Question 1(25= )

Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction

Intelligent Patient Profiling for Diagnosis, Staging and Treatment Selection in Colon Cancer

IN SPITE of a very quick development of medicine within

malignant polyp Daily Challenges in Digestive Endoscopy for Endoscopists and Endoscopy Nurses BSGIE Annual Meeting 18/09/2014 Mechelen

Journal of Engineering Technology

2016 WHO CLASSIFICATION OF TUMOURS OF THE PROSTATE. Peter A. Humphrey, MD, PhD Yale University School of Medicine New Haven, CT

Journal: Nature Methods

PROSTATIC ADENOCARCINOMA: DIAGNOSTIC CRITERIA AND IMPORTANT MIMICKERS PROSTATIC ADENOCARCINOMA: DIAGNOSTIC CRITERIA

Quality ID #250 (NQF 1853): Radical Prostatectomy Pathology Reporting National Quality Strategy Domain: Effective Clinical Care

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Motivation: Fraud Detection

2018 Grade PEGGY ADAMO, RHIT, CTR OCTOBER 11, 2018

Inter-session reproducibility measures for high-throughput data sources

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

CANCER CLASSIFICATION USING SINGLE GENES

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Geisinger Clinic Annual Progress Report: 2011 Nonformula Grant

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

International Society of Gynecological Pathologists Symposium 2007

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

GOBLET CELL CARCINOID. Hanlin L. Wang, MD, PhD University of California Los Angeles

GOBLET CELL CARCINOID

3. Guidelines for Reporting Bladder Cancer, Prostate Cancer and Renal Tumours

Pathologic Assessment of Invasion in TUR Specimens. A. Lopez-Beltran. T1 (ct1)

This chapter shows how to use a formula method for dosage calculation to calculate the

OMPRN Pathology Matters Meeting 2017

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

ISSN: Page 1. SSRG International Journal of Medical Science (SSRG-IJMS) volume 2 Issue 6 June 2015

Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

Sta 309 (Statistics And Probability for Engineers)

Instructions for Coding Grade for 2014+

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

S1.04 PRINCIPAL CLINICIAN G1.01 COMMENTS S2.01 SPECIMEN LABELLED AS G2.01 *SPECIMEN DIMENSIONS (PROSTATE) S2.03 *SEMINAL VESICLES

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Current Clinical Practice. MR Imaging Evaluations. MRI Anatomic Review. Imaging to Address Clinical Challenges. Prostate MR

BREAST CANCER EARLY DETECTION USING X RAY IMAGES

6/5/2010. Renal vein invasion & Capsule Penetration (T3a) Adrenal Gland involvement (T4 vs. M1) Beyond Gerota s Fascia? (?T4).

Linking Tissue Microarchitectures to Rationalized Molecular Diagnostics in Glandular Cancers

Preoperative Gleason score, percent of positive prostate biopsies and PSA in predicting biochemical recurrence after radical prostatectomy

Atypical Hyperplasia/EIN

CS 453X: Class 18. Jacob Whitehill

Transcription:

Gene expression correlates of clinical prostate cancer behavior Cancer Cell 2002 1: 203-209. Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C, Tamayo P, Renshaw A, D Amico A, Richie J, Lander E, Loda M, Kantoff P, Golub T, Sellers W. Topics in Bioinformatics Robert Kazmierski Oct 12, 2004

Primers Gleason score Signal to noise metric k-nn Leave one out CV (continued) Permutation Testing (Part 1)

Gleason Score From http://www.prostateinfo.com 1 Simple round glands, closely packed in rounded masses with well-defined edges. 2 Simple round glands, loosely packed in vague, rounded masses with loosely packed edges. 3A Medium-sized single glands of irregular shape and irregular spacing with ill-defined infiltrating edges. 3B Very similar to 3A, but small to very small glands which must not form significant chains or cords. 3C Papillary and cribriform epithelium in smooth, rounded cylinders and masses; no necrosis. 4A Small, medium, or large glands fused into cords, chains or ragged, infiltrating masses. 4B Very similar to 4A, but with many large clear cells, sometimes resembling "hypernephroma." 5A No glandular differentiation, solid sheets, cords, single cells, or solid nests of tumor with central necrosis. 5B Anaplastic adenocarcinoma in ragged sheets.

Gleason Score Pathologist examines two areas that make up the largest portion of the tissue Each area scored and resulting sum is Gleason score Answers how cancerous is the prostate tissue High = very cancerous Low = less cancerous

Signal to Noise Metric From Golub et al. 1999 µ σ Class0 µ + σ Class1 Where µ is the mean and σ is the standard deviation In our case, Class 0 would be Cancer and Class 1 would be Normal Method for determining statistical significant difference S Class0 Class1 = 2 N

Signal to Noise Metric vs. T-Test S2N different from T-test t = nclass0 + n nclass0n Class1 Class1 ( n µ Class0 Class0 µ 1) σ n Class1 Class0 2 Class0 + + ( n n Class1 Class 1 1) σ 2 2 Class1 Here n is number of samples S2N does not take n into account

k-nearest Neighbor Given: d(x,x ) : metric to measure relative distances Euclidean distance used in this case L : training set f knn (x) = majority class among the k NN s of x in L Where x is a sample to be classified (cancer vs. normal) K specifies how many neighbors to consider Usually odd number

3-Nearest Neighbor 3 NN 2 Blue, 1 Red Green grouped as Blue

5-Nearest Neighbor 5 NN 2 Blue, 3 Red Green grouped as Red

k-nearest Neighbor Result influenced by Choice of k Training set Measure of distance Most basic classification tool

Leave one out CV (continued) Method for determining error in classifier Everything must be done inside of CV loops Outside adjustments lead to incorrect error estimates (usually too low)

Leave one out CV 102 Samples testing effectiveness of k-nn classifier (k is set to 3)(10 most significant genes used in k-nn) 1 sample set aside S2N of 101 samples determines 10 most significantly different genes in two classes 3-NN by Euclidean distance performed on single sample Sample classified as Class0 or Class1 Results recorded as correct classification or not Process repeated 101 more times (each sample set aside once) Results give accuracy of using 10 most significant genes with S2N metric and 3-NN classification method

Leave one out CV 102 Samples testing effectiveness of k-nn classifier, now allowing k to be 3, or 9 (10 most significant genes used in k-nn) 1 sample set aside 1 sample set aside, k set to 3 S2N of 100 samples determines 10 most significantly different genes in two classes 3-NN by Euclidean distance performed on single sample Sample classified as Class0 or Class1 Results recorded as correct classification or not Process repeated 100 more times (each sample set aside once) 1 sample set aside, k set to 9 S2N of 100 samples determines 10 most significantly different genes in two classes 9-NN by Euclidean distance performed on single sample Sample classified as Class0 or Class1 Results recorded as correct classification or not Process repeated 100 more times (each sample set aside once) Best -NN determined by least error in inner loop of CV S2N of 101 samples determines 10 most significantly different genes in two classes Best -NN by Euclidean distance performed on single sample Sample classified as Class0 or Class1 Results recorded as correct classification or not Process repeated 101 more times (each sample set aside once) Results give accuracy of using 10 most significant genes with S2N metric and 3,9-NN classification method

Leave one out CV 102 Samples testing effectiveness of k-nn classifier, allowing k to be 3, or 9, and allowing 10, 20, 50 most significant genes to be used in k-nn Inner loop for selection of k as seen before Inner loop for number of genes to be used in k- NN classification Results give accuracy of using most accurate amount of genes and most accurate k in classification

Permutation Testing (part 1) Method for determining significance of correlations to classes i.e. how likely the gene is expressed more highly in cancer by chance

Permutation Testing Take data and randomly reassign class labels (cancer, normal) to gene expression levels Removes correlation between gene expression levels and class labels Repeat correlation method (S2N metric in this case) to see if correlation still exists between gene data and class labels Repeat reassignment of class labels (permutation) and test again for correlation If correlations persist through permutations, they are not significant If correlations are lost through permutations to a significant level, they can be considered significant

Singh et al. Objectives Identify genes in microarray expression analysis that might anticipate clinical behavior of prostate cancer Cancer vs. Normal Recurrent vs. Nonrecurrent

Motivation Prostate cancer very common among cancers Early diagnosis increases chances of survival Clinical tests (Gleason Score, serum PSA) not completely reliable Progress has already been made in linking differential gene expression to cancer (p53, myc, p27, PTEN) No gene found yet to have sufficient prognostic utility to warrant clinical implementation

Methods 12,600 genes in microarray analysis 52 tumor and 50 normal prostate specimens

Tumor vs normal classification Genes ranked according to their differential expression across the 2 classes using S2N metric Statistical significance of these rankings determined using a permutation test 1000 permutation of class labels determined 317 genes had significantly higher expression in tumor samples 139 genes had significantly higher expression in normal samples p = 0.001 in this case means that the correlation existed at most once in the 1000 permutations for each of the significantly different genes

Top 50 genes with high expression in tumor along with top 50 genes with high expression in normal. (red = above mean of all samples, blue = below)

Tumor vs normal classification Built k-nn algorithm using significantly different genes Models using 4 or more genes classified samples with greater than 90% accuracy in leave one out CV (p < 0.001 as measured by permutation testing) Methods of leave one out CV not included as well as value of k

Tumor vs normal classification 4 gene and 16 gene models were tested on independent data set 8 normal, 27 tumor 4 gene model accuracy 77% 16 gene model accuracy 86%

Prediction of pathological features Gene expression data within 52 tumor samples analyzed for correlations with clinical behavior Determined by comparing observed correlations with those in a randomly permutated dataset Correlation found only with Gleason score (GS)

Prediction of pathological features 15 genes had expression positively correlated with GS (Type I) 14 genes had expression negatively correlated with GS (Type II) Red = above mean Blue = below mean

Prediction of pathological features Same 29 genes were used to drive hierarchical clustering of independent data set Type I and Type II genes remained highly cosegregated suggesting this coexpression is reproducible

Prediction of clinical outcome 21 patients evaluated with respect to recurrence following surgery 8 relapsed 13 have remained cancer free for 4 years No single gene associated with recurrence k-nn classification with k=2 on a 5 gene model results in 90% classification accuracy during leave one out CV Again no information provided for CV methods No independent data available for testing

Prediction of clinical outcome As another test of significance, tested 1000 permutations of class labels and attempted to find multigene expression classifiers using same range of gene numbers 37 of 1000 permutations yielded accuracy of 90% or more p = 0.037

Discussion Clinical Use? Level of accuracy in classifying tumor vs. normal is 86-92% While high, still not sufficient to replace histological examination No association between serum PSA and gene expression Possible that more genes need to be included and/or more samples need to be evaluated

Discussion GS correlation GS was associated with patient outcome However, only 2 genes correlated with GS were used in outcome prediction model Genes most frequently used in model were not correlated with GS GS-independent markers and determinants of prostate cancer exist

Discussion recurrence predictor model 5 gene model correctly predicted 19 of 21 evaluable patients Authors concede that model may be result of overoptimization More datasets needed for model validation Some of genes used commonly in model are known to have correlation with prostate cancer

Conclusion This is a proof of concept paper used to suggest further research rather than suggest changes in clinical practice Authors use of leave one out CV not well explained and could be result of very good results from models Independent testing helped validate models