Predicting Non-Small Cell Lung Cancer Diagnosis and Prognosis by Fully Automated Microscopic Pathology Image Features Kun-Hsing Yu, MD, PhD Department of Biomedical Informatics, Harvard Medical School November 5 th, 2017 1
Non-small Cell Lung Cancer Heavy disease burden 85% of lung cancer >2.1M new cases/year No. 1 cause of cancer-related deaths >1M deaths/year 2 deaths/minute Diverse clinical outcome Same histopathology-defined subtypes è different survival outcomes Jemal A et al. CA Cancer J Clin. 2011 Mar-Apr;61(2):69-90. Bianchi F et al. J Clin Invest. 2007 Nov; 117(11): 3436 3444. 2
Histopathology Definitive diagnosis of many complex diseases Performed by trained pathologists Defined disease types, but could be subjective Automated image processing pipelines enables the extraction of objective features Beck AH et al. Sci Transl Med. 2011 Nov 9;3(108):108ra113. 3 National Institutes of Health
Extracting Nuclei / Cytoplasm Features Features Statistics Area Compactness Eccentricity Major/minor axis length Perimeter Pixel intensity distribution Haralick texture features Nucleus/cytoplasm ratio etc. Total: 9,879 features x Mean Median Percentiles Variance 4
Machine Learning Methods Supervised learning Decision trees Naïve Bayes (NB) classifiers Support vector machines (SVM) Ensemble: random forests Elastic net-cox survival models Feature selection Information content measurements Yu, KH et al. American Medical Informatics Association 2013 Annual Symposium. 5
Evaluations Cross-validation Parameter estimation Evaluate by independent test sets Held-out datasets from TCGA An external validation set from the Stanford Tissue Microarray (TMA) Database Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 6
Examining the Utility of the Extracted Features Diagnosis classification Histopathology features define cancer types A useful set of features should be able to recapitulate diagnostic patterns Pathology evaluation is laborious and subjective κ=0.55-0.59 for classifying LUAD and LUSC1 1Grilley-Olson 7 JE, et al. Archives of pathology & laboratory medicine 137, 32-40 (2013)
Fully Automated Image Features Identified Images with Malignant Cells (A) LUAD versus Benign (B) LUSC versus Benign With 80 features selected by information gain ratio Sensitivity Bagging Naive Bayes Random Forest Random Forest with CITs SVMs with Gaussian Kernel SVMs with Linear Kernel SVMs with Polynomial Kernel Sensitivity Bagging Naive Bayes Random Forest Random Forest with CITs SVMs with Gaussian Kernel SVMs with Linear Kernel SVMs with Polynomial Kernel AUC=0.73-0.85 AUC=0.77-0.88 1 Specificity 1 Specificity Top features: Radial distribution of nuclei pixels, Textures (pixel correlations, intensity variance) of the nuclei Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 8
Image Features Distinguished the Two Types of Lung Malignancy Sensitivity (A) TCGA dataset: AUC 0.7 Bagging Naive Bayes Random Forest Random Forest with CITs SVMs with Gaussian Kernel SVMs with Linear Kernel SVMs with Polynomial Kernel Sensitivity (B) TMA dataset: AUC=0.73-0.85 With 240 features selected by information gain ratio Bagging Conditional Inference Trees Naive Bayes Random Forest Random Forest with CITs SVMs with Gaussian Kernel SVMs with Linear Kernel SVMs with Polynomial Kernel SVMs with Sigmoid Kernel 1 Specificity 1 Specificity Top features: Intensity distribution in the nuclei, Textures of the nuclei Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 9
Probability of Survival Stage and Grade are Insufficient to Predict Adenocarcinoma Patient Survival (A) Survival stratified by stage Survival Groups Stage I Stage II Stage III Stage IV P<0.01 0 50 100 150 200 Months Probability of Survival (B) Stage I patient survival stratified by grade P=0.06 Histology Grade Grade 1 Grade 1 2 Grade 2 Grade 2 3 Grade 3 0 50 100 150 200 Months Great diversity in Stage I patient survival Pathology grade did NOT significantly correlate with survival Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 10
Probability of Survival Image Features Predicted Prognosis in Stage I Adenocarcinoma Patients (A) Image features predicted the prognosis of TCGA stage I patients Predicted Prognostic Groups Longer-term Survivors Shorter-term Survivors 0 50 100 150 200 Months P=23 Probability of Survival (B) Validated in TMA P=0.028 Predicted Prognostic Groups Longer-term Survivors Shorter-term Survivors 0 50 100 150 Months Quantitative features predicted survival, validated in TMA Top features: Zernike shape features of the nuclei, intensity distribution in the cytoplasm Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 11
Stage and Grade are Insufficient to Predict LUSC Patient Survival Probability of Survival (A) Survival stratified by stage P=0.191 Survival Groups Stage I Stage II Stage III Stage IV 0 50 100 150 Months (B) Stage I patient stratified by grade Probability of Survival P=0.847 Histology Grade Grade 1 Grade 1 2 Grade 2 Grade 2 3 Grade 3 Grade 3 4 Grade 4 0 50 100 150 Months Neither pathology stage nor grade was significantly associated with survival Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 12
Image Features Predicted Prognosis in Squamous Cell Carcinoma Patients Probability of Survival (A) Image features predicted prognosis of TCGA patients Predicted Prognostic Groups Longer-term Survivors Shorter-term Survivors P=0.023 0 50 100 150 Months Probability of Survival (B) Validated in TMA P=0.035 Predicted Prognostic Groups Longer-term Survivors Shorter-term Survivors 0 50 100 150 Months Quantitative features predicted survival, validated in TMA Top features: Zernike shape features and textures of the nuclei Yu KH et al. Nat Commun. 2016 Aug 16;7:12474. 13
Summary Developed a fully-automated algorithm to extract quantitative features from histopathology images Demonstrated the utility of texture and shape features in prognosis prediction 14
Acknowledgements Zak Lab Isaac S. Kohane, MD, PhD Nathan Palmer, PhD Arjun Manrai, PhD William Yuan, MS Oren Miron, MS Sam Finlayson, MS Vincent Hu, BS Samantha Lemos, BA Snyder Lab Michael Snyder, PhD Jingjing Li, PhD Collin Melton, PhD Konrad Karczewski, PhD Altman Lab Russ B. Altman, MD, PhD Bethany Percha, PhD Weizhuang Zhou, MS Emily Mallory, MS Ré Lab Christopher Ré, PhD Ce Zhang, PhD Feiran Wang, MS Collaborators Daniel Rubin, MD, MS Gerald Berry, MD Matt van de Rijn, MD, PhD Funding Harvard Data Science Fellowship Howard Hughes Medical Institute (HHMI) International Student Research Fellowship Stanford Graduate Fellowship (SGF) 15
Thank you. J khyu@stanford.edu 16