Repeatability and reproducibility of 18 F-NaF PET quantitative imaging biomarkers

Repeatability and reproducibility of 18 F-NaF PET quantitative imaging biomarkers Christie Lin, Tyler Bradshaw, Timothy Perk, Stephanie Harmon, Glenn Liu, Robert Jeraj University of Wisconsin Madison, Department of Medical Physics clin232@wisc.edu NCCAAPM Madison, WI October 30, 2015

Introduction: NaF PET 18 F-NaF PET, a surrogate of bone metabolism, was first introduced as an imaging agent for detecting bone lesions (Blau 1962) 18 F-NaF exchanges with small hydroxyl ions (OH-) in the bone crystal, hydroxyapatite (Blau 1962) Within minutes, the ion passes from the plasma through ECF localized into the shell of bound water surrounding each crystal

Introduction: Imaging bone metastases Metastases to the bone detection drives the interest to identify imaging biomarkers (Mick 2014) 18 F-NaF PET has superior resolution and sensitivity as compared to 99m Tc bone scans (Even-Sapir 2006, Iagaru 2013) Quantitative Imaging Biomarkers Alliance (QIBA) states repeatability and reproducibility quantification are critical for accurate assessment of therapeutic response (Raunig 2014) There has been one study to date on the repeatability of NaF PET imaging in humans (Kurdziel 2012) Conducted imaging at one center

Introduction: Metastasis to the bone Prostate and breast cancer preferentially metastasizes to the bone More than 90% of metastatic prostate cancer (mpca) patients develop bone metastases (Bubendorf 2000) About 70% of metastatic breast cancer (mbc) patients develop bone metastases (Manders 2006) Survival rates are low for metastatic cancer Patients with mpca have a poor prognosis and a median survival of 18-24 months from initial progression (Huang 2012) Patients with mbc median survival of 55 months (Ahn 2013) Because survival rates are higher with earlier detection, early diagnosis and treatment is crucial!

Research Objectives Quantify the repeatability of 18 F-NaF PET-derived standardized uptake value (SUV) imaging and texture features Identify NaF PET-derived imaging features which are repeatable Quantify variability between imaging sites in a multicenter trial Establish response criteria for 18 F-NaF PET-based treatment assessment

Methods: Image acquisition Scan Acquisition: Multicenter trial of 34 metastatic castrate-resistant prostate cancer patients Patients Bone lesions Center 1 18 264 Center 2 10 67 Center 3 6 68 All 34 399 Obtained test-retest whole-body NaF PET/CT scans Test Retest

Methods: Image acquisition Osseous Lesion Segmentation: SUV threshold = 15

Methods: Image feature extraction Feature Basis SUV First-order Co-occurrence matrix Gray level run length Neighboring gray level Neighborhood gray tone difference matrix Features SUVmax, SUVmean, SUVtotal Max, TLG, Volume, Stdev, Variance, CoV, Skewness, Kurtosis, Energy, Entropy Angular Moment, Contrast-GLCM, Correlation, Sum of Squares Variance, Inverse Difference Moment, Sum Average, Sum Variance, Sum Entropy, Entropy-GLCM, Difference Variance, Difference Entropy, Information Measure of Correlation 1, Information Measure of Correlation 2, Maximal Correlation Coefficient, Maximum Probability, Diagonal Moment, Dissimilarity, Difference Energy, Inertia, Inverse Difference Moment, Sum Energy, Cluster Shade, Cluster Prominence Small Run Emphasis, Long Run Emphasis, Gray-Level Nonuniformity, Run Length Nonuniformity, Run Percentage, Low Gray-Level Emphasis, High Gray-Level Emphasis, Short Run Low Gray-Level Emphasis, Short Run High Gray-Level Emphasis, Long Run Low Gray-Level Emphasis, Long Run High Gray-Level Emphasis Small Number Emphasis, Large Number Emphasis, Number Nonuniformity, Second Moment, Entropy- NGL Coarseness, Contrast-NGL, Busyness (Galavis 2010)

Lesion-level SUV quantification Test Low repeatability Retest SUV Feature SUV 48.2 SUV max 28.8 22.8 SUV mean 19.4 286.4 SUV total 92.7 High repeatability SUV Feature SUV 64.5 SUV max 63.7 29.7 SUV mean 28.9 453 SUV total 478 15 50

Methods: Statistical analysis Transforming measurements: Distributions of measurements were skewed, warranting a natural-log transformation Measurement difference between scans, within lesion Measures of repeatability: Coefficient of variation (CV) Intraclass correlation coefficient (ICC) b : between lesions w : within lesions (Bland 1996, Raunig 2014)

Coefficient of variation varies by feature

ICC varies by feature b : between lesions w : within lesions

Repeatability of NaF PET/CT imaging features ICC vs CV

Features of high repeatability ICC vs CV

Inter-site CV is generally consistent

Repeatability across sites X-bars range(icc) Y-bars range(cv)

Metrics of high repeatability across sites X-bars range(icc) Y-bars range(cv)

Determining confidence intervals 95% confidence intervals developed from test-retest measurements can be applied to untransformed data for establishing response criteria (Bland 1996) Log-transformed measurement difference 95% confidence intervals (CI 95% )

Confidence intervals of SUV metrics by site 2.5 95% confidence Interval (ratio) 2 1.5 1 0.5 0 1 2 3 Pooled 4 Site SUVmax SUVmean SUVtotal e.g., CI 95% of 1.00[0.80, 1.20] indicates 95% confidence intervals of ±20%

Summary: Repeatability of 18 F-NaF PET Quantified the repeatability of 54 18 F-NaF PET-derived standardized uptake value (SUV) metrics and PET-derived texture features for individual lesions High repeatability: SUV metrics: SUVmean, SUVtotal, SUVmax First-order: energy, entropy, median, variance Neighborhood gray-tone difference matrix: coarseness, contrast-ngl Low repeatability: kurtosis, skewness Evaluated the variability of 18 F-NaF PET imaging across multiple centers Metrics with high repeatability were consistent between sites Established response criteria for 18 F-NaF PET-based treatment assessment Future work: Determine repeatability of 18 F-NaF PET by the spatial location of the metastasis Christie Lin @ clin232@wisc.edu

References American Cancer Society. Cancer Facts & Figures 2014. Atlanta, GA: American Cancer Society; 2014. Bland, J.M. and D.G. Altman, Transformations, means, and confidence intervals. British Medical Journal, 1996. 312(7038): p. 1079-1079. Bland J. Statistics notes: Transformations, means, and confidence intervals. BMJ 1996; 312 Bubendorf L, Schöpfer A, Wagner U, et al. Metastatic patterns of prostate cancer: an autopsy study of 1,589 patients. Hum Pathol. 2000;31(5):578-583. Galavis P et al. Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters. Acta Oncologica 2010; (49)1012-16. Huang X, Chau CH, Figg WD. Challenges to improved therapeutics for metastatic castrate resistant prostate cancer: from recent successes and failures. J Hematol Oncol. 2012;5:35. Kurdziel K et al. The Kinetics and Reproducibility of 18F-Sodium Fluoride for Oncology Using Current PET Camera Technology. J Nucl Med, 2012. Leijenaar R et al. Stability of FDG-PET Radiomics features: An integrated analysis of test-retest and interobserver variability. Acta Oncologica, 2013; 52: 1391 1397. Mick C. et al. Molecular Imaging in Oncology: 18F-Sodium Fluoride PET Imaging of Osseous Metastatic Disease; AJR 2014. Raunig D., et al. Quantitative Imaging Biomarkers: a Review of Statistical Methods for Technical Performance Assessment. SMMR, 2014. Schirrmeister, H., et al., Prospective evaluation of the clinical value of planar bone scans, SPECT, and (18)Flabeled NaF PET in newly diagnosed lung cancer. J Nucl Med, 2001. 42(12): p. 1800-4. Tixier F et al. Reproducibility of tumor uptake heterogeneity characterization through textural feature analysis in 18F-FDG PET. J Nucl Med. 2012 May;53(5):693-700. doi: 10.2967/jnumed.111.099127. Vaz S, et al. The Case for Using the Repeatability Coefficient When Calculating Test Retest Reliability. 10.1371/journal.pone.0073990. Yip S and Jeraj R. Use of articulated registration for response assessment of individual metastatic bone lesions. 2014 Phys. Med. Biol. 59 1501 doi:10.1088/0031-9155/59/6/1501.

Repeatability of SUVmax: distribution RC= 95% LOA = [-0.27, +0.27]

Method: Image Acquisition Scanner Centers 1 and 2 were taken on the General Electric Discovery VCT scanner Center 3 were taken on the Philips Gemini scanner Acquisition 60 minutes post injection whole-body scan: 3 minutes per bed position from the base of skull to the proximal femora Reconstruction centers 1 and 2 was 3D ordered subset expectation maximization (OSEM): 256 256 grid size, 14 subsets, 2 iterations and 4 mm post reconstruction filter center 3 was 3D OSEM: 144 144 grid size, 33 subsets, and 2 iterations

Articulated Registration Algorithm (Yip, Jeraj 2014)

Results: coefficient of variation Kurtosis SUV max

Statistical analysis: measurement error indices Log transform to approximate normal distribution (Bland 1996) Relative mean difference (RMD): relative difference between the paired measurements Bland-Altman (B-A) plots: to show trends in variability over the measuring interval Repeatability Coefficient (RC): least significant difference between two repeated measurements 95% Limits of agreement (LOA): 95% interval in which difference is expected to lie Coefficient of variation (CV): within lesion variance Intraclass correlation coefficient (ICC): relative variance (Vaz et al 2013, QIBA 2014)

SUVmax: Inter-site measurement error indices suggest high repeatability b : between site w : within site

Relative mean difference (RMD) varies significantly by feature RMD(SUV max )=0.05% Are the